Message ID | 20221021185916.1494314-1-vipinsh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] Add Hyperv extended hypercall support in KVM | expand |
On Fri, Oct 21, 2022, Vipin Sharma wrote: > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available > hypercalls by guest VMs. > > Add support for HvExtCallQueryCapabilities (0x8001) and > HvExtCallGetBootZeroedMemory (0x8002) in KVM. > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all > extended hypercalls are supported by hypervisor. > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to > know which extended hypercalls are supported in KVM. After which the > userspace will enable capabilities for the guest VM. > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel, Does this really need to be handle by KVM? I assume this is a rare operation, e.g. done once during guest boot, so performance shouldn't be a concern. To avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID, but otherwise KVM can let userspace deal with the "is this enabled" check. Aha! And if KVM "allows" all theoretically possible extended hypercalls, then KVM will never need a capability to announce "support" for a new hypercall, i.e. define KVM's ABI to be that KVM punts all possible extended hypercalls to userspace if CPUID.0x40000003.EBX BIT(20) is enabled. > whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace > for further action. > > Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea Drop gerrit's Change-Id when posting publicly. If KVM punts the support checks to userspace, then the KVM side of things is very minimal and future proof (unless Microsoft hoses us). E.g. with code deduplication that should be moved to a prep patch: --- arch/x86/kvm/hyperv.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 0adf4a437e85..f9253249de00 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -2138,6 +2138,12 @@ static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc) kvm_fpu_put(); } +/* + * The TLFS carves out 64 possible extended hypercalls, numbered sequentially + * after the base capabilities extended hypercall. + */ +#define HV_EXT_CALL_MAX (HV_EXT_CALL_QUERY_CAPABILITIES + 64) + static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) { if (!hv_vcpu->enforce_cpuid) @@ -2178,6 +2184,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) case HVCALL_SEND_IPI: return hv_vcpu->cpuid_cache.enlightenments_eax & HV_X64_CLUSTER_IPI_RECOMMENDED; + case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: + return hv_vcpu->cpuid_cache.features_ebx & + HV_ENABLE_EXTENDED_HYPERCALLS; + break; default: break; } @@ -2270,14 +2280,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) ret = HV_STATUS_INVALID_HYPERCALL_INPUT; break; } - vcpu->run->exit_reason = KVM_EXIT_HYPERV; - vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; - vcpu->run->hyperv.u.hcall.input = hc.param; - vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; - vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; - vcpu->arch.complete_userspace_io = - kvm_hv_hypercall_complete_userspace; - return 0; + goto hypercall_userspace_exit; case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST: if (unlikely(hc.var_cnt)) { ret = HV_STATUS_INVALID_HYPERCALL_INPUT; @@ -2336,15 +2339,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) ret = HV_STATUS_OPERATION_DENIED; break; } - vcpu->run->exit_reason = KVM_EXIT_HYPERV; - vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; - vcpu->run->hyperv.u.hcall.input = hc.param; - vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; - vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; - vcpu->arch.complete_userspace_io = - kvm_hv_hypercall_complete_userspace; - return 0; + goto hypercall_userspace_exit; } + case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: + if (unlikely(hc.fast)) { + ret = HV_STATUS_INVALID_PARAMETER; + break; + } + goto hypercall_userspace_exit; default: ret = HV_STATUS_INVALID_HYPERCALL_CODE; break; @@ -2352,6 +2354,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) hypercall_complete: return kvm_hv_hypercall_complete(vcpu, ret); +hypercall_userspace_exit: + vcpu->run->exit_reason = KVM_EXIT_HYPERV; + vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; + vcpu->run->hyperv.u.hcall.input = hc.param; + vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; + vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; + vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace; + return 0; } void kvm_hv_init_vm(struct kvm *kvm) @@ -2494,6 +2504,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, ent->ebx |= HV_POST_MESSAGES; ent->ebx |= HV_SIGNAL_EVENTS; + ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS; ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE; ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE; base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354 --
On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Oct 21, 2022, Vipin Sharma wrote: > > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per > > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available > > hypercalls by guest VMs. > > > > Add support for HvExtCallQueryCapabilities (0x8001) and > > HvExtCallGetBootZeroedMemory (0x8002) in KVM. > > > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by > > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM > > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all > > extended hypercalls are supported by hypervisor. > > > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to > > know which extended hypercalls are supported in KVM. After which the > > userspace will enable capabilities for the guest VM. > > > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel, > > Does this really need to be handle by KVM? I assume this is a rare operation, > e.g. done once during guest boot, so performance shouldn't be a concern. To > avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY > to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID, > but otherwise KVM can let userspace deal with the "is this enabled" check. There are 4 more extended hypercalls mentioned in TLFS but there is no detail about them in the document. From the linux source code one of the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call. In the file drivers/hv/hv_balloon.c status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT, nents, 0, hint, NULL); This makes me a little bit wary that these hypercalls or any future hypercalls can have high calling frequency by Windows guest. Also, it is not clear which calls can or cannot be satisfied by userspace alone. So, I am not sure if the default exit to userspace for all of the extended hypercalls will be future proof, therefore, I went with the approach of only selectively exiting to userspace based on hypercall. > > Aha! And if KVM "allows" all theoretically possible extended hypercalls, then > KVM will never need a capability to announce "support" for a new hypercall, i.e. > define KVM's ABI to be that KVM punts all possible extended hypercalls to userspace > if CPUID.0x40000003.EBX BIT(20) is enabled. > > > whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace > > for further action. > > > > Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea > > Drop gerrit's Change-Id when posting publicly. > > If KVM punts the support checks to userspace, then the KVM side of things is very > minimal and future proof (unless Microsoft hoses us). E.g. with code deduplication > that should be moved to a prep patch: > > --- > arch/x86/kvm/hyperv.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c > index 0adf4a437e85..f9253249de00 100644 > --- a/arch/x86/kvm/hyperv.c > +++ b/arch/x86/kvm/hyperv.c > @@ -2138,6 +2138,12 @@ static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc) > kvm_fpu_put(); > } > > +/* > + * The TLFS carves out 64 possible extended hypercalls, numbered sequentially > + * after the base capabilities extended hypercall. > + */ > +#define HV_EXT_CALL_MAX (HV_EXT_CALL_QUERY_CAPABILITIES + 64) > + > static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) > { > if (!hv_vcpu->enforce_cpuid) > @@ -2178,6 +2184,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) > case HVCALL_SEND_IPI: > return hv_vcpu->cpuid_cache.enlightenments_eax & > HV_X64_CLUSTER_IPI_RECOMMENDED; > + case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: > + return hv_vcpu->cpuid_cache.features_ebx & > + HV_ENABLE_EXTENDED_HYPERCALLS; > + break; > default: > break; > } > @@ -2270,14 +2280,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > break; > } > - vcpu->run->exit_reason = KVM_EXIT_HYPERV; > - vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; > - vcpu->run->hyperv.u.hcall.input = hc.param; > - vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; > - vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; > - vcpu->arch.complete_userspace_io = > - kvm_hv_hypercall_complete_userspace; > - return 0; > + goto hypercall_userspace_exit; > case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST: > if (unlikely(hc.var_cnt)) { > ret = HV_STATUS_INVALID_HYPERCALL_INPUT; > @@ -2336,15 +2339,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > ret = HV_STATUS_OPERATION_DENIED; > break; > } > - vcpu->run->exit_reason = KVM_EXIT_HYPERV; > - vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; > - vcpu->run->hyperv.u.hcall.input = hc.param; > - vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; > - vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; > - vcpu->arch.complete_userspace_io = > - kvm_hv_hypercall_complete_userspace; > - return 0; > + goto hypercall_userspace_exit; > } > + case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: > + if (unlikely(hc.fast)) { > + ret = HV_STATUS_INVALID_PARAMETER; > + break; > + } > + goto hypercall_userspace_exit; > default: > ret = HV_STATUS_INVALID_HYPERCALL_CODE; > break; > @@ -2352,6 +2354,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > > hypercall_complete: > return kvm_hv_hypercall_complete(vcpu, ret); > +hypercall_userspace_exit: > + vcpu->run->exit_reason = KVM_EXIT_HYPERV; > + vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; > + vcpu->run->hyperv.u.hcall.input = hc.param; > + vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; > + vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; > + vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace; > + return 0; > } > > void kvm_hv_init_vm(struct kvm *kvm) > @@ -2494,6 +2504,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, > > ent->ebx |= HV_POST_MESSAGES; > ent->ebx |= HV_SIGNAL_EVENTS; > + ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS; > > ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE; > ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE; > > base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354 > -- >
On Fri, Oct 21, 2022, Vipin Sharma wrote: > On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Oct 21, 2022, Vipin Sharma wrote: > > > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per > > > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available > > > hypercalls by guest VMs. > > > > > > Add support for HvExtCallQueryCapabilities (0x8001) and > > > HvExtCallGetBootZeroedMemory (0x8002) in KVM. > > > > > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by > > > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM > > > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all > > > extended hypercalls are supported by hypervisor. > > > > > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to > > > know which extended hypercalls are supported in KVM. After which the > > > userspace will enable capabilities for the guest VM. > > > > > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel, > > > > Does this really need to be handle by KVM? I assume this is a rare operation, > > e.g. done once during guest boot, so performance shouldn't be a concern. To > > avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY > > to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID, > > but otherwise KVM can let userspace deal with the "is this enabled" check. > > There are 4 more extended hypercalls mentioned in TLFS but there is no > detail about them in the document. From the linux source code one of > the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call. > In the file drivers/hv/hv_balloon.c > status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT, > nents, 0, hint, NULL); > > This makes me a little bit wary that these hypercalls or any future > hypercalls can have high calling frequency by Windows guest. Also, it > is not clear which calls can or cannot be satisfied by userspace > alone. If future support needs to be moved into KVM, e.g. for performance reasons, then we can do that if necessary. > So, I am not sure if the default exit to userspace for all of the > extended hypercalls will be future proof, therefore, I went with the > approach of only selectively exiting to userspace based on hypercall. But punting on everything _might_ be future proof, whereas the only way that selectively exiting ends up being future proof is if no one ever wants to support another extended hypercall.
Sean Christopherson <seanjc@google.com> writes: > On Fri, Oct 21, 2022, Vipin Sharma wrote: >> On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote: >> > >> > On Fri, Oct 21, 2022, Vipin Sharma wrote: >> > > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per >> > > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available >> > > hypercalls by guest VMs. >> > > >> > > Add support for HvExtCallQueryCapabilities (0x8001) and >> > > HvExtCallGetBootZeroedMemory (0x8002) in KVM. >> > > >> > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by >> > > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM >> > > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all >> > > extended hypercalls are supported by hypervisor. >> > > >> > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to >> > > know which extended hypercalls are supported in KVM. After which the >> > > userspace will enable capabilities for the guest VM. >> > > >> > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel, >> > >> > Does this really need to be handle by KVM? I assume this is a rare operation, >> > e.g. done once during guest boot, so performance shouldn't be a concern. To >> > avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY >> > to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID, >> > but otherwise KVM can let userspace deal with the "is this enabled" check. >> >> There are 4 more extended hypercalls mentioned in TLFS but there is no >> detail about them in the document. From the linux source code one of >> the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call. >> In the file drivers/hv/hv_balloon.c >> status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT, >> nents, 0, hint, NULL); >> >> This makes me a little bit wary that these hypercalls or any future >> hypercalls can have high calling frequency by Windows guest. Also, it >> is not clear which calls can or cannot be satisfied by userspace >> alone. > > If future support needs to be moved into KVM, e.g. for performance reasons, then > we can do that if necessary. > >> So, I am not sure if the default exit to userspace for all of the >> extended hypercalls will be future proof, therefore, I went with the >> approach of only selectively exiting to userspace based on hypercall. > > But punting on everything _might_ be future proof, whereas the only way that > selectively exiting ends up being future proof is if no one ever wants to support > another extended hypercall. While some 'extended' hypercalls may indeed need to be handled in KVM, there's no harm done in forwarding all unknown-to-KVM hypercalls to userspace. The only issue I envision is how would userspace discover which extended hypercalls are supported by KVM in case it (userspace) is responsible for handling HvExtCallQueryCapabilities call which returns the list of supported hypercalls. E.g. in case we decide to implement HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to userspace? Normally, VMM discovers the availability of Hyper-V features through KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in CPUID. This can be always be solved by adding new KVM CAPs of course. Alternatively, we can add a single "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of extended hypercalls supported by KVM (which Vipin's patch adds anyway to *set* the list instead). TL;DR: handling HvExtCallQueryCapabilities and all unknown-to-kvm (now: all) extended hypercalls in userspace sounds like the right approach.
On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote: > While some 'extended' hypercalls may indeed need to be handled in KVM, > there's no harm done in forwarding all unknown-to-KVM hypercalls to > userspace. The only issue I envision is how would userspace discover > which extended hypercalls are supported by KVM in case it (userspace) is > responsible for handling HvExtCallQueryCapabilities call which returns > the list of supported hypercalls. E.g. in case we decide to implement > HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to > userspace? > > Normally, VMM discovers the availability of Hyper-V features through > KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in > CPUID. This can be always be solved by adding new KVM CAPs of > course. Alternatively, we can add a single > "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of > extended hypercalls supported by KVM (which Vipin's patch adds anyway to > *set* the list instead). AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are supported, so a single CAP should be a perfect fit. And KVM can use the capability to enumerate support for _and_ to allow userspace to enable in-kernel handling. E.g. check(): case KVM_CAP_HYPERV_EXT_CALL: return KVM_SUPPORTED_HYPERV_EXT_CALL; enable(): case KVM_CAP_HYPERV_EXT_CALL: r = -EINVAL; if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL) break; mutex_lock(&kvm->lock); if (!kvm->created_vcpus) { to_kvm_hv(kvm)->ext_call = cap->args[0]; r = 0; } mutex_unlock(&kvm->lock); kvm_hv_hypercall() case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: if (unlikely(hc.fast)) { ret = HV_STATUS_INVALID_PARAMETER; break; } if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call)) goto hypercall_userspace_exit; ret = kvm_hv_ext_hypercall(...) break; That maintains backwards compatibility with "exit on everything" as userspace still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it provides the necessary knob for userspace to tell KVM which hypercalls should be allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES.
On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote: > > While some 'extended' hypercalls may indeed need to be handled in KVM, > > there's no harm done in forwarding all unknown-to-KVM hypercalls to > > userspace. The only issue I envision is how would userspace discover > > which extended hypercalls are supported by KVM in case it (userspace) is > > responsible for handling HvExtCallQueryCapabilities call which returns > > the list of supported hypercalls. E.g. in case we decide to implement > > HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to > > userspace? > > > > Normally, VMM discovers the availability of Hyper-V features through > > KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in > > CPUID. This can be always be solved by adding new KVM CAPs of > > course. Alternatively, we can add a single > > "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of > > extended hypercalls supported by KVM (which Vipin's patch adds anyway to > > *set* the list instead). > > AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are > supported, so a single CAP should be a perfect fit. And KVM can use the capability > to enumerate support for _and_ to allow userspace to enable in-kernel handling. E.g. > > check(): > case KVM_CAP_HYPERV_EXT_CALL: > return KVM_SUPPORTED_HYPERV_EXT_CALL; > > > enable(): > > case KVM_CAP_HYPERV_EXT_CALL: > r = -EINVAL; > if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL) > break; > > mutex_lock(&kvm->lock); > if (!kvm->created_vcpus) { Any reason for setting capability only after vcpus are created? Also, in my patch I wrote the ioctl at kvm_vcpu_ioctl_enable_cap() as all of the hyperv related code was there but since this capability is a vm setting not a per vcpu setting, should this be at kvm_vm_ioctl() as a better choice? > to_kvm_hv(kvm)->ext_call = cap->args[0]; > r = 0; > } > mutex_unlock(&kvm->lock); > > kvm_hv_hypercall() > > > case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: > if (unlikely(hc.fast)) { > ret = HV_STATUS_INVALID_PARAMETER; > break; > } > if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call)) It won't be directly this. There will be a mapping of hc.code to the corresponding bit and then "&" with ext_call. > goto hypercall_userspace_exit; > > ret = kvm_hv_ext_hypercall(...) > break; > > > That maintains backwards compatibility with "exit on everything" as userspace > still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it > provides the necessary knob for userspace to tell KVM which hypercalls should be > allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES. So, should I send a version with KVM capability similar to above or for now just send the version which by default exit to userspace and later whenever the need arises KVM capability can be added then?
On Mon, Oct 24, 2022, Vipin Sharma wrote: > On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote: > > > While some 'extended' hypercalls may indeed need to be handled in KVM, > > > there's no harm done in forwarding all unknown-to-KVM hypercalls to > > > userspace. The only issue I envision is how would userspace discover > > > which extended hypercalls are supported by KVM in case it (userspace) is > > > responsible for handling HvExtCallQueryCapabilities call which returns > > > the list of supported hypercalls. E.g. in case we decide to implement > > > HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to > > > userspace? > > > > > > Normally, VMM discovers the availability of Hyper-V features through > > > KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in > > > CPUID. This can be always be solved by adding new KVM CAPs of > > > course. Alternatively, we can add a single > > > "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of > > > extended hypercalls supported by KVM (which Vipin's patch adds anyway to > > > *set* the list instead). > > > > AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are > > supported, so a single CAP should be a perfect fit. And KVM can use the capability > > to enumerate support for _and_ to allow userspace to enable in-kernel handling. E.g. > > > > check(): > > case KVM_CAP_HYPERV_EXT_CALL: > > return KVM_SUPPORTED_HYPERV_EXT_CALL; > > > > > > enable(): > > > > case KVM_CAP_HYPERV_EXT_CALL: > > r = -EINVAL; > > if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL) > > break; > > > > mutex_lock(&kvm->lock); > > if (!kvm->created_vcpus) { > > Any reason for setting capability only after vcpus are created? This only allows setting the capability _before_ vCPUs are created. Attempting to set the cap after vCPUs are created gets rejected with -EINVAL. This requirement means vCPUs don't need to take a lock to consume per-VM state, as KVM prevents the state from changing once vCPUs are created. > Also, in my patch I wrote the ioctl at kvm_vcpu_ioctl_enable_cap() as > all of the hyperv related code was there but since this capability is > a vm setting not a per vcpu setting, should this be at kvm_vm_ioctl() > as a better choice? Yep! > > to_kvm_hv(kvm)->ext_call = cap->args[0]; > > r = 0; > > } > > mutex_unlock(&kvm->lock); > > > > kvm_hv_hypercall() > > > > > > case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX: > > if (unlikely(hc.fast)) { > > ret = HV_STATUS_INVALID_PARAMETER; > > break; > > } > > if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call)) > > It won't be directly this. There will be a mapping of hc.code to the > corresponding bit and then "&" with ext_call. > > > > goto hypercall_userspace_exit; > > > > ret = kvm_hv_ext_hypercall(...) > > break; > > > > > > That maintains backwards compatibility with "exit on everything" as userspace > > still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it > > provides the necessary knob for userspace to tell KVM which hypercalls should be > > allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES. > > So, should I send a version with KVM capability similar to above No, the above was just a sketch of how we can extend support if necessary. In general, we try to avoid creating _any_ APIs before they are strictly required. For uAPIs, that's pretty much a hard rule. > or for now just send the version which by default exit to userspace and later > whenever the need arises KVM capability can be added then? This one please :-)
On Mon, Oct 24, 2022 at 12:36 PM Sean Christopherson <seanjc@google.com> wrote: > > On Mon, Oct 24, 2022, Vipin Sharma wrote: > > On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote: > > > enable(): > > > > > > case KVM_CAP_HYPERV_EXT_CALL: > > > r = -EINVAL; > > > if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL) > > > break; > > > > > > mutex_lock(&kvm->lock); > > > if (!kvm->created_vcpus) { > > > > Any reason for setting capability only after vcpus are created? > > This only allows setting the capability _before_ vCPUs are created. Attempting > to set the cap after vCPUs are created gets rejected with -EINVAL. This > requirement means vCPUs don't need to take a lock to consume per-VM state, as KVM > prevents the state from changing once vCPUs are created. I totally misread the condition and didn't notice the '!' in the if() statement. Thanks for the feedback.
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 3089ec352743..421279a61a9a 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -158,6 +158,9 @@ #define HV_SHARED_GPA_BOUNDARY_ACTIVE BIT(5) #define HV_SHARED_GPA_BOUNDARY_BITS GENMASK(11, 6) +/* Extended hypercalls supported by KVM */ +#define HV_EXT_CALL_QUERY_CAPABILITIES_MASK BIT(0) + enum hv_isolation_type { HV_ISOLATION_TYPE_NONE = 0, HV_ISOLATION_TYPE_VBS = 1, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7551b6f9c31c..b1892ea39a23 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1041,6 +1041,8 @@ struct kvm_hv { struct hv_partition_assist_pg *hv_pa_pg; struct kvm_hv_syndbg hv_syndbg; + + u64 extended_hypercalls_cap; }; struct msr_bitmap_range { diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index 0adf4a437e85..5f0b7d8789a8 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -32,6 +32,7 @@ #include <linux/eventfd.h> #include <asm/apicdef.h> +#include <asm/hyperv-tlfs.h> #include <trace/events/kvm.h> #include "trace.h" @@ -2140,6 +2141,8 @@ static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc) static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) { + struct kvm_hv *hv; + if (!hv_vcpu->enforce_cpuid) return true; @@ -2178,6 +2181,14 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code) case HVCALL_SEND_IPI: return hv_vcpu->cpuid_cache.enlightenments_eax & HV_X64_CLUSTER_IPI_RECOMMENDED; + case HV_EXT_CALL_QUERY_CAPABILITIES: + return hv_vcpu->cpuid_cache.features_ebx & + HV_ENABLE_EXTENDED_HYPERCALLS; + case HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY: + hv = to_kvm_hv(hv_vcpu->vcpu->kvm); + return hv->extended_hypercalls_cap & + HV_EXT_CAPABILITY_GET_BOOT_ZEROED_MEMORY; + break; default: break; } @@ -2189,6 +2200,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); struct kvm_hv_hcall hc; + struct kvm_hv *hv; u64 ret = HV_STATUS_SUCCESS; /* @@ -2345,6 +2357,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) kvm_hv_hypercall_complete_userspace; return 0; } + case HV_EXT_CALL_QUERY_CAPABILITIES: + if (unlikely(hc.fast)) { + ret = HV_STATUS_INVALID_PARAMETER; + break; + } + hv = to_kvm_hv(hv_vcpu->vcpu->kvm); + if (kvm_vcpu_write_guest(vcpu, hc.outgpa, + &hv->extended_hypercalls_cap, + sizeof(hv->extended_hypercalls_cap))) + ret = HV_STATUS_OPERATION_DENIED; + break; + case HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY: + if (unlikely(hc.fast)) { + ret = HV_STATUS_INVALID_PARAMETER; + break; + } + vcpu->run->exit_reason = KVM_EXIT_HYPERV; + vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; + vcpu->run->hyperv.u.hcall.input = hc.param; + vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa; + vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa; + vcpu->arch.complete_userspace_io = + kvm_hv_hypercall_complete_userspace; + return 0; default: ret = HV_STATUS_INVALID_HYPERCALL_CODE; break; @@ -2494,6 +2530,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, ent->ebx |= HV_POST_MESSAGES; ent->ebx |= HV_SIGNAL_EVENTS; + ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS; ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE; ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE; @@ -2578,3 +2615,21 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, return 0; } + +int kvm_hv_set_ext_call_cap(struct kvm_vcpu *vcpu, uint64_t cap) +{ + struct kvm_hv *hv; + int r; + + if (cap & ~HV_EXT_CALL_QUERY_CAPABILITIES_MASK) + return -EINVAL; + + r = kvm_hv_vcpu_init(vcpu); + if (r) + return r; + + hv = to_kvm_hv(vcpu->kvm); + hv->extended_hypercalls_cap = cap; + + return 0; +} diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h index 1030b1b50552..b92f8abdbf0d 100644 --- a/arch/x86/kvm/hyperv.h +++ b/arch/x86/kvm/hyperv.h @@ -150,5 +150,6 @@ int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce); int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args); int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid, struct kvm_cpuid_entry2 __user *entries); +int kvm_hv_set_ext_call_cap(struct kvm_vcpu *vcpu, uint64_t cap); #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4bd5f8a751de..caca1f537f6d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4515,6 +4515,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_X86_NOTIFY_VMEXIT: r = kvm_caps.has_notify_vmexit; break; + case KVM_CAP_HYPERV_EXT_CALL_QUERY: + r = HV_EXT_CALL_QUERY_CAPABILITIES_MASK; + break; default: break; } @@ -5510,6 +5513,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, kvm_update_pv_runtime(vcpu); return 0; + case KVM_CAP_HYPERV_EXT_CALL_QUERY: + return kvm_hv_set_ext_call_cap(vcpu, cap->args[0]); default: return -EINVAL; } diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h index fdce7a4cfc6f..15ffc2c5d950 100644 --- a/include/asm-generic/hyperv-tlfs.h +++ b/include/asm-generic/hyperv-tlfs.h @@ -162,6 +162,7 @@ struct ms_hyperv_tsc_page { /* Extended hypercalls */ #define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001 +#define HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY 0x8002 #define HV_EXT_CALL_MEMORY_HEAT_HINT 0x8003 #define HV_FLUSH_ALL_PROCESSORS BIT(0) @@ -170,7 +171,8 @@ struct ms_hyperv_tsc_page { #define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT BIT(3) /* Extended capability bits */ -#define HV_EXT_CAPABILITY_MEMORY_COLD_DISCARD_HINT BIT(8) +#define HV_EXT_CAPABILITY_GET_BOOT_ZEROED_MEMORY BIT(0) +#define HV_EXT_CAPABILITY_MEMORY_COLD_DISCARD_HINT BIT(8) enum HV_GENERIC_SET_FORMAT { HV_GENERIC_SET_SPARSE_4K, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0d5d4419139a..42860137e545 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_HYPERV_EXT_CALL_QUERY 224 #ifdef KVM_CAP_IRQ_ROUTING
Hyperv hypercalls above 0x8000 are called as extended hypercalls as per Hyperv TLFS. Hypercall 0x8001 is used to enquire about available hypercalls by guest VMs. Add support for HvExtCallQueryCapabilities (0x8001) and HvExtCallGetBootZeroedMemory (0x8002) in KVM. A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM make hypercall HvExtCallQueryCapabilities (0x8001) to know what all extended hypercalls are supported by hypervisor. A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to know which extended hypercalls are supported in KVM. After which the userspace will enable capabilities for the guest VM. HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel, whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace for further action. Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea Signed-off-by: Vipin Sharma <vipinsh@google.com> --- Hi, This is an RFC patch based on the previous discussion https://lore.kernel.org/kvm/CAHVum0cbWBXUnJ4s32Yn=TfPXLypv_RRT6LmA_QoBHw3Y+kA7w@mail.gmail.com/#t Things missing in this RFC patch which I will add when sending proper patch: 1. Documentation 2. Selftest 3. Multiple smaller patches instead of one. I also need suggestions regarding KVM_ENABLE_CAP usage in this patch. My idea is userspace can query to know what all capabilities are supported by KVM and based on that it can call KVM_ENABLE_CAP to enable only select capabilities. Also userspace need to enforce hyperv CPUID check by KVM_CAP_HYPERV_ENFORCE_CPUID to make sure these are enforced (hyperv default is to accept all, hv_check_hypercall_access()). Current approach is storing capabilities given by userspace in struct kvm_hv{}, I was not sure which will be good place, struct kvm_hv{} or struct kvm_vcpu_hv{}. Thanks Vipin arch/x86/include/asm/hyperv-tlfs.h | 3 ++ arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/hyperv.c | 55 ++++++++++++++++++++++++++++++ arch/x86/kvm/hyperv.h | 1 + arch/x86/kvm/x86.c | 5 +++ include/asm-generic/hyperv-tlfs.h | 4 ++- include/uapi/linux/kvm.h | 1 + 7 files changed, 70 insertions(+), 1 deletion(-)