diff mbox series

[RFC] Add Hyperv extended hypercall support in KVM

Message ID 20221021185916.1494314-1-vipinsh@google.com (mailing list archive)
State New, archived
Headers show
Series [RFC] Add Hyperv extended hypercall support in KVM | expand

Commit Message

Vipin Sharma Oct. 21, 2022, 6:59 p.m. UTC
Hyperv hypercalls above 0x8000 are called as extended hypercalls as per
Hyperv TLFS. Hypercall 0x8001 is used to enquire about available
hypercalls by guest VMs.

Add support for HvExtCallQueryCapabilities (0x8001) and
HvExtCallGetBootZeroedMemory (0x8002) in KVM.

A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by
using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM
make hypercall HvExtCallQueryCapabilities (0x8001) to know what all
extended hypercalls are supported by hypervisor.

A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to
know which extended hypercalls are supported in KVM. After which the
userspace will enable capabilities for the guest VM.

HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel,
whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace
for further action.

Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea
Signed-off-by: Vipin Sharma <vipinsh@google.com>
---

Hi,

This is an RFC patch based on the previous discussion
https://lore.kernel.org/kvm/CAHVum0cbWBXUnJ4s32Yn=TfPXLypv_RRT6LmA_QoBHw3Y+kA7w@mail.gmail.com/#t

Things missing in this RFC patch which I will add when sending proper
patch:
1. Documentation
2. Selftest
3. Multiple smaller patches instead of one.

I also need suggestions regarding KVM_ENABLE_CAP usage in this patch. My
idea is userspace can query to know what all capabilities are supported by KVM
and based on that it can call KVM_ENABLE_CAP to enable only select
capabilities. Also userspace need to enforce hyperv CPUID check by
KVM_CAP_HYPERV_ENFORCE_CPUID to make sure these are enforced (hyperv
default is to accept all, hv_check_hypercall_access()).

Current approach is storing capabilities given by userspace in struct kvm_hv{},
I was not sure which will be good place, struct kvm_hv{} or struct
kvm_vcpu_hv{}.

Thanks
Vipin

 arch/x86/include/asm/hyperv-tlfs.h |  3 ++
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/hyperv.c              | 55 ++++++++++++++++++++++++++++++
 arch/x86/kvm/hyperv.h              |  1 +
 arch/x86/kvm/x86.c                 |  5 +++
 include/asm-generic/hyperv-tlfs.h  |  4 ++-
 include/uapi/linux/kvm.h           |  1 +
 7 files changed, 70 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Oct. 21, 2022, 8:13 p.m. UTC | #1
On Fri, Oct 21, 2022, Vipin Sharma wrote:
> Hyperv hypercalls above 0x8000 are called as extended hypercalls as per
> Hyperv TLFS. Hypercall 0x8001 is used to enquire about available
> hypercalls by guest VMs.
> 
> Add support for HvExtCallQueryCapabilities (0x8001) and
> HvExtCallGetBootZeroedMemory (0x8002) in KVM.
> 
> A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by
> using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM
> make hypercall HvExtCallQueryCapabilities (0x8001) to know what all
> extended hypercalls are supported by hypervisor.
> 
> A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to
> know which extended hypercalls are supported in KVM. After which the
> userspace will enable capabilities for the guest VM.
> 
> HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel,

Does this really need to be handle by KVM?  I assume this is a rare operation,
e.g. done once during guest boot, so performance shouldn't be a concern.  To
avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY
to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID,
but otherwise KVM can let userspace deal with the "is this enabled" check.

Aha!  And if KVM "allows" all theoretically possible extended hypercalls, then
KVM will never need a capability to announce "support" for a new hypercall, i.e.
define KVM's ABI to be that KVM punts all possible extended hypercalls to userspace
if CPUID.0x40000003.EBX BIT(20) is enabled.

> whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace
> for further action.
> 
> Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea

Drop gerrit's Change-Id when posting publicly.

If KVM punts the support checks to userspace, then the KVM side of things is very
minimal and future proof (unless Microsoft hoses us).  E.g. with code deduplication
that should be moved to a prep patch:

---
 arch/x86/kvm/hyperv.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0adf4a437e85..f9253249de00 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2138,6 +2138,12 @@ static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc)
 	kvm_fpu_put();
 }
 
+/*
+ * The TLFS carves out 64 possible extended hypercalls, numbered sequentially
+ * after the base capabilities extended hypercall.
+ */
+#define HV_EXT_CALL_MAX (HV_EXT_CALL_QUERY_CAPABILITIES + 64)
+
 static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
 {
 	if (!hv_vcpu->enforce_cpuid)
@@ -2178,6 +2184,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
 	case HVCALL_SEND_IPI:
 		return hv_vcpu->cpuid_cache.enlightenments_eax &
 			HV_X64_CLUSTER_IPI_RECOMMENDED;
+	case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
+		return hv_vcpu->cpuid_cache.features_ebx &
+		       HV_ENABLE_EXTENDED_HYPERCALLS;
+		break;
 	default:
 		break;
 	}
@@ -2270,14 +2280,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 			ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
 			break;
 		}
-		vcpu->run->exit_reason = KVM_EXIT_HYPERV;
-		vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
-		vcpu->run->hyperv.u.hcall.input = hc.param;
-		vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
-		vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
-		vcpu->arch.complete_userspace_io =
-				kvm_hv_hypercall_complete_userspace;
-		return 0;
+		goto hypercall_userspace_exit;
 	case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
 		if (unlikely(hc.var_cnt)) {
 			ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
@@ -2336,15 +2339,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 			ret = HV_STATUS_OPERATION_DENIED;
 			break;
 		}
-		vcpu->run->exit_reason = KVM_EXIT_HYPERV;
-		vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
-		vcpu->run->hyperv.u.hcall.input = hc.param;
-		vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
-		vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
-		vcpu->arch.complete_userspace_io =
-				kvm_hv_hypercall_complete_userspace;
-		return 0;
+		goto hypercall_userspace_exit;
 	}
+	case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
+		if (unlikely(hc.fast)) {
+			ret = HV_STATUS_INVALID_PARAMETER;
+			break;
+		}
+		goto hypercall_userspace_exit;
 	default:
 		ret = HV_STATUS_INVALID_HYPERCALL_CODE;
 		break;
@@ -2352,6 +2354,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 
 hypercall_complete:
 	return kvm_hv_hypercall_complete(vcpu, ret);
+hypercall_userspace_exit:
+	vcpu->run->exit_reason = KVM_EXIT_HYPERV;
+	vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
+	vcpu->run->hyperv.u.hcall.input = hc.param;
+	vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
+	vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
+	vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
+	return 0;
 }
 
 void kvm_hv_init_vm(struct kvm *kvm)
@@ -2494,6 +2504,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 
 			ent->ebx |= HV_POST_MESSAGES;
 			ent->ebx |= HV_SIGNAL_EVENTS;
+			ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
 
 			ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
 			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;

base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
--
Vipin Sharma Oct. 21, 2022, 9:51 p.m. UTC | #2
On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Oct 21, 2022, Vipin Sharma wrote:
> > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per
> > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available
> > hypercalls by guest VMs.
> >
> > Add support for HvExtCallQueryCapabilities (0x8001) and
> > HvExtCallGetBootZeroedMemory (0x8002) in KVM.
> >
> > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by
> > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM
> > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all
> > extended hypercalls are supported by hypervisor.
> >
> > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to
> > know which extended hypercalls are supported in KVM. After which the
> > userspace will enable capabilities for the guest VM.
> >
> > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel,
>
> Does this really need to be handle by KVM?  I assume this is a rare operation,
> e.g. done once during guest boot, so performance shouldn't be a concern.  To
> avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY
> to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID,
> but otherwise KVM can let userspace deal with the "is this enabled" check.

There are 4 more extended hypercalls mentioned in TLFS but there is no
detail about them in the document. From the linux source code one of
the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call.
In the file drivers/hv/hv_balloon.c
          status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT,
nents, 0,  hint, NULL);

This makes me a little bit wary that these hypercalls or any future
hypercalls can have high calling frequency by Windows guest. Also, it
is not clear which calls can or cannot be satisfied by userspace
alone.

So, I am not sure if the default exit to userspace for all of the
extended hypercalls will be future proof, therefore, I went with the
approach of only selectively exiting to userspace based on hypercall.

>
> Aha!  And if KVM "allows" all theoretically possible extended hypercalls, then
> KVM will never need a capability to announce "support" for a new hypercall, i.e.
> define KVM's ABI to be that KVM punts all possible extended hypercalls to userspace
> if CPUID.0x40000003.EBX BIT(20) is enabled.
>
> > whereas, HvExtCallGetBootZeroedMemory (0x8002) is passed to userspace
> > for further action.
> >
> > Change-Id: Ib3709fadbf11f91be2842c8486bcbe755e09cbea
>
> Drop gerrit's Change-Id when posting publicly.
>
> If KVM punts the support checks to userspace, then the KVM side of things is very
> minimal and future proof (unless Microsoft hoses us).  E.g. with code deduplication
> that should be moved to a prep patch:
>
> ---
>  arch/x86/kvm/hyperv.c | 43 +++++++++++++++++++++++++++----------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 0adf4a437e85..f9253249de00 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2138,6 +2138,12 @@ static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc)
>         kvm_fpu_put();
>  }
>
> +/*
> + * The TLFS carves out 64 possible extended hypercalls, numbered sequentially
> + * after the base capabilities extended hypercall.
> + */
> +#define HV_EXT_CALL_MAX (HV_EXT_CALL_QUERY_CAPABILITIES + 64)
> +
>  static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
>  {
>         if (!hv_vcpu->enforce_cpuid)
> @@ -2178,6 +2184,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
>         case HVCALL_SEND_IPI:
>                 return hv_vcpu->cpuid_cache.enlightenments_eax &
>                         HV_X64_CLUSTER_IPI_RECOMMENDED;
> +       case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
> +               return hv_vcpu->cpuid_cache.features_ebx &
> +                      HV_ENABLE_EXTENDED_HYPERCALLS;
> +               break;
>         default:
>                 break;
>         }
> @@ -2270,14 +2280,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>                         ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
>                         break;
>                 }
> -               vcpu->run->exit_reason = KVM_EXIT_HYPERV;
> -               vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
> -               vcpu->run->hyperv.u.hcall.input = hc.param;
> -               vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> -               vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> -               vcpu->arch.complete_userspace_io =
> -                               kvm_hv_hypercall_complete_userspace;
> -               return 0;
> +               goto hypercall_userspace_exit;
>         case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST:
>                 if (unlikely(hc.var_cnt)) {
>                         ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
> @@ -2336,15 +2339,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>                         ret = HV_STATUS_OPERATION_DENIED;
>                         break;
>                 }
> -               vcpu->run->exit_reason = KVM_EXIT_HYPERV;
> -               vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
> -               vcpu->run->hyperv.u.hcall.input = hc.param;
> -               vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> -               vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> -               vcpu->arch.complete_userspace_io =
> -                               kvm_hv_hypercall_complete_userspace;
> -               return 0;
> +               goto hypercall_userspace_exit;
>         }
> +       case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
> +               if (unlikely(hc.fast)) {
> +                       ret = HV_STATUS_INVALID_PARAMETER;
> +                       break;
> +               }
> +               goto hypercall_userspace_exit;
>         default:
>                 ret = HV_STATUS_INVALID_HYPERCALL_CODE;
>                 break;
> @@ -2352,6 +2354,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>
>  hypercall_complete:
>         return kvm_hv_hypercall_complete(vcpu, ret);
> +hypercall_userspace_exit:
> +       vcpu->run->exit_reason = KVM_EXIT_HYPERV;
> +       vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
> +       vcpu->run->hyperv.u.hcall.input = hc.param;
> +       vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
> +       vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
> +       vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
> +       return 0;
>  }
>
>  void kvm_hv_init_vm(struct kvm *kvm)
> @@ -2494,6 +2504,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>
>                         ent->ebx |= HV_POST_MESSAGES;
>                         ent->ebx |= HV_SIGNAL_EVENTS;
> +                       ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
>
>                         ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
>                         ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
>
> base-commit: e18d6152ff0f41b7f01f9817372022df04e0d354
> --
>
Sean Christopherson Oct. 21, 2022, 10:04 p.m. UTC | #3
On Fri, Oct 21, 2022, Vipin Sharma wrote:
> On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Oct 21, 2022, Vipin Sharma wrote:
> > > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per
> > > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available
> > > hypercalls by guest VMs.
> > >
> > > Add support for HvExtCallQueryCapabilities (0x8001) and
> > > HvExtCallGetBootZeroedMemory (0x8002) in KVM.
> > >
> > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by
> > > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM
> > > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all
> > > extended hypercalls are supported by hypervisor.
> > >
> > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to
> > > know which extended hypercalls are supported in KVM. After which the
> > > userspace will enable capabilities for the guest VM.
> > >
> > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel,
> >
> > Does this really need to be handle by KVM?  I assume this is a rare operation,
> > e.g. done once during guest boot, so performance shouldn't be a concern.  To
> > avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY
> > to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID,
> > but otherwise KVM can let userspace deal with the "is this enabled" check.
> 
> There are 4 more extended hypercalls mentioned in TLFS but there is no
> detail about them in the document. From the linux source code one of
> the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call.
> In the file drivers/hv/hv_balloon.c
>           status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT,
> nents, 0,  hint, NULL);
> 
> This makes me a little bit wary that these hypercalls or any future
> hypercalls can have high calling frequency by Windows guest. Also, it
> is not clear which calls can or cannot be satisfied by userspace
> alone.

If future support needs to be moved into KVM, e.g. for performance reasons, then
we can do that if necessary.  

> So, I am not sure if the default exit to userspace for all of the
> extended hypercalls will be future proof, therefore, I went with the
> approach of only selectively exiting to userspace based on hypercall.

But punting on everything _might_ be future proof, whereas the only way that
selectively exiting ends up being future proof is if no one ever wants to support
another extended hypercall.
Vitaly Kuznetsov Oct. 24, 2022, 11:52 a.m. UTC | #4
Sean Christopherson <seanjc@google.com> writes:

> On Fri, Oct 21, 2022, Vipin Sharma wrote:
>> On Fri, Oct 21, 2022 at 1:13 PM Sean Christopherson <seanjc@google.com> wrote:
>> >
>> > On Fri, Oct 21, 2022, Vipin Sharma wrote:
>> > > Hyperv hypercalls above 0x8000 are called as extended hypercalls as per
>> > > Hyperv TLFS. Hypercall 0x8001 is used to enquire about available
>> > > hypercalls by guest VMs.
>> > >
>> > > Add support for HvExtCallQueryCapabilities (0x8001) and
>> > > HvExtCallGetBootZeroedMemory (0x8002) in KVM.
>> > >
>> > > A guest VM finds availability of HvExtCallQueryCapabilities (0x8001) by
>> > > using CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest VM
>> > > make hypercall HvExtCallQueryCapabilities (0x8001) to know what all
>> > > extended hypercalls are supported by hypervisor.
>> > >
>> > > A userspace VMM can query capability KVM_CAP_HYPERV_EXT_CALL_QUERY to
>> > > know which extended hypercalls are supported in KVM. After which the
>> > > userspace will enable capabilities for the guest VM.
>> > >
>> > > HvExtCallQueryCapabilities (0x8001) is handled by KVM in kernel,
>> >
>> > Does this really need to be handle by KVM?  I assume this is a rare operation,
>> > e.g. done once during guest boot, so performance shouldn't be a concern.  To
>> > avoid breaking existing userspace, KVM can forward HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY
>> > to userspace if and only if HV_ENABLE_EXTENDED_HYPERCALLS is enabled in CPUID,
>> > but otherwise KVM can let userspace deal with the "is this enabled" check.
>> 
>> There are 4 more extended hypercalls mentioned in TLFS but there is no
>> detail about them in the document. From the linux source code one of
>> the hypercall HvExtCallMemoryHeatHint (0x8003) is a repetitive call.
>> In the file drivers/hv/hv_balloon.c
>>           status = hv_do_rep_hypercall(HV_EXT_CALL_MEMORY_HEAT_HINT,
>> nents, 0,  hint, NULL);
>> 
>> This makes me a little bit wary that these hypercalls or any future
>> hypercalls can have high calling frequency by Windows guest. Also, it
>> is not clear which calls can or cannot be satisfied by userspace
>> alone.
>
> If future support needs to be moved into KVM, e.g. for performance reasons, then
> we can do that if necessary.  
>
>> So, I am not sure if the default exit to userspace for all of the
>> extended hypercalls will be future proof, therefore, I went with the
>> approach of only selectively exiting to userspace based on hypercall.
>
> But punting on everything _might_ be future proof, whereas the only way that
> selectively exiting ends up being future proof is if no one ever wants to support
> another extended hypercall.

While some 'extended' hypercalls may indeed need to be handled in KVM,
there's no harm done in forwarding all unknown-to-KVM hypercalls to
userspace. The only issue I envision is how would userspace discover
which extended hypercalls are supported by KVM in case it (userspace) is
responsible for handling HvExtCallQueryCapabilities call which returns
the list of supported hypercalls. E.g. in case we decide to implement
HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to
userspace?

Normally, VMM discovers the availability of Hyper-V features through
KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in
CPUID. This can be always be solved by adding new KVM CAPs of
course. Alternatively, we can add a single
"KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of
extended hypercalls supported by KVM (which Vipin's patch adds anyway to
*set* the list instead).

TL;DR: handling HvExtCallQueryCapabilities and all unknown-to-kvm (now:
all) extended hypercalls in userspace sounds like the right approach.
Sean Christopherson Oct. 24, 2022, 3:22 p.m. UTC | #5
On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote:
> While some 'extended' hypercalls may indeed need to be handled in KVM,
> there's no harm done in forwarding all unknown-to-KVM hypercalls to
> userspace. The only issue I envision is how would userspace discover
> which extended hypercalls are supported by KVM in case it (userspace) is
> responsible for handling HvExtCallQueryCapabilities call which returns
> the list of supported hypercalls. E.g. in case we decide to implement
> HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to
> userspace?
> 
> Normally, VMM discovers the availability of Hyper-V features through
> KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in
> CPUID. This can be always be solved by adding new KVM CAPs of
> course. Alternatively, we can add a single
> "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of
> extended hypercalls supported by KVM (which Vipin's patch adds anyway to
> *set* the list instead).

AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are
supported, so a single CAP should be a perfect fit.  And KVM can use the capability
to enumerate support for _and_ to allow userspace to enable in-kernel handling.  E.g.

check():
	case KVM_CAP_HYPERV_EXT_CALL:
		return KVM_SUPPORTED_HYPERV_EXT_CALL;


enable():

	case KVM_CAP_HYPERV_EXT_CALL:
		r = -EINVAL;
		if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL)
			break;

		mutex_lock(&kvm->lock);
		if (!kvm->created_vcpus) {
			to_kvm_hv(kvm)->ext_call = cap->args[0];
			r = 0;
		}
		mutex_unlock(&kvm->lock);

kvm_hv_hypercall()


	case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
		if (unlikely(hc.fast)) {
			ret = HV_STATUS_INVALID_PARAMETER;
			break;
		}
		if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call))
			goto hypercall_userspace_exit;

		ret = kvm_hv_ext_hypercall(...)
		break;


That maintains backwards compatibility with "exit on everything" as userspace
still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it
provides the necessary knob for userspace to tell KVM which hypercalls should be
allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES.
Vipin Sharma Oct. 24, 2022, 6:29 p.m. UTC | #6
On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote:
> > While some 'extended' hypercalls may indeed need to be handled in KVM,
> > there's no harm done in forwarding all unknown-to-KVM hypercalls to
> > userspace. The only issue I envision is how would userspace discover
> > which extended hypercalls are supported by KVM in case it (userspace) is
> > responsible for handling HvExtCallQueryCapabilities call which returns
> > the list of supported hypercalls. E.g. in case we decide to implement
> > HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to
> > userspace?
> >
> > Normally, VMM discovers the availability of Hyper-V features through
> > KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in
> > CPUID. This can be always be solved by adding new KVM CAPs of
> > course. Alternatively, we can add a single
> > "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of
> > extended hypercalls supported by KVM (which Vipin's patch adds anyway to
> > *set* the list instead).
>
> AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are
> supported, so a single CAP should be a perfect fit.  And KVM can use the capability
> to enumerate support for _and_ to allow userspace to enable in-kernel handling.  E.g.
>
> check():
>         case KVM_CAP_HYPERV_EXT_CALL:
>                 return KVM_SUPPORTED_HYPERV_EXT_CALL;
>
>
> enable():
>
>         case KVM_CAP_HYPERV_EXT_CALL:
>                 r = -EINVAL;
>                 if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL)
>                         break;
>
>                 mutex_lock(&kvm->lock);
>                 if (!kvm->created_vcpus) {

Any reason for setting capability only after vcpus are created?
Also, in my patch I wrote the ioctl at kvm_vcpu_ioctl_enable_cap() as
all of the hyperv related code was there but since this capability is
a vm setting not a per vcpu setting, should this be at  kvm_vm_ioctl()
as a better choice?

>                         to_kvm_hv(kvm)->ext_call = cap->args[0];
>                         r = 0;
>                 }
>                 mutex_unlock(&kvm->lock);
>
> kvm_hv_hypercall()
>
>
>         case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
>                 if (unlikely(hc.fast)) {
>                         ret = HV_STATUS_INVALID_PARAMETER;
>                         break;
>                 }
>                 if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call))

It won't be directly this. There will be a mapping of hc.code to the
corresponding bit and then "&" with ext_call.


>                         goto hypercall_userspace_exit;
>
>                 ret = kvm_hv_ext_hypercall(...)
>                 break;
>
>
> That maintains backwards compatibility with "exit on everything" as userspace
> still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it
> provides the necessary knob for userspace to tell KVM which hypercalls should be
> allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES.

So, should I send a version with KVM capability similar to above or
for now just send the version which by default exit to userspace and
later whenever the need arises KVM capability can be added then?
Sean Christopherson Oct. 24, 2022, 7:36 p.m. UTC | #7
On Mon, Oct 24, 2022, Vipin Sharma wrote:
> On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote:
> > > While some 'extended' hypercalls may indeed need to be handled in KVM,
> > > there's no harm done in forwarding all unknown-to-KVM hypercalls to
> > > userspace. The only issue I envision is how would userspace discover
> > > which extended hypercalls are supported by KVM in case it (userspace) is
> > > responsible for handling HvExtCallQueryCapabilities call which returns
> > > the list of supported hypercalls. E.g. in case we decide to implement
> > > HvExtCallMemoryHeatHint in KVM, how are we going to communicate this to
> > > userspace?
> > >
> > > Normally, VMM discovers the availability of Hyper-V features through
> > > KVM_GET_SUPPORTED_HV_CPUID but extended hypercalls are not listed in
> > > CPUID. This can be always be solved by adding new KVM CAPs of
> > > course. Alternatively, we can add a single
> > > "KVM_CAP_HYPERV_EXT_CALL_QUERY" which will just return the list of
> > > extended hypercalls supported by KVM (which Vipin's patch adds anyway to
> > > *set* the list instead).
> >
> > AIUI, the TLFS uses a 64-bit mask to enumerate which extended hypercalls are
> > supported, so a single CAP should be a perfect fit.  And KVM can use the capability
> > to enumerate support for _and_ to allow userspace to enable in-kernel handling.  E.g.
> >
> > check():
> >         case KVM_CAP_HYPERV_EXT_CALL:
> >                 return KVM_SUPPORTED_HYPERV_EXT_CALL;
> >
> >
> > enable():
> >
> >         case KVM_CAP_HYPERV_EXT_CALL:
> >                 r = -EINVAL;
> >                 if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL)
> >                         break;
> >
> >                 mutex_lock(&kvm->lock);
> >                 if (!kvm->created_vcpus) {
> 
> Any reason for setting capability only after vcpus are created?

This only allows setting the capability _before_ vCPUs are created.  Attempting
to set the cap after vCPUs are created gets rejected with -EINVAL.  This
requirement means vCPUs don't need to take a lock to consume per-VM state, as KVM
prevents the state from changing once vCPUs are created.

> Also, in my patch I wrote the ioctl at kvm_vcpu_ioctl_enable_cap() as
> all of the hyperv related code was there but since this capability is
> a vm setting not a per vcpu setting, should this be at  kvm_vm_ioctl()
> as a better choice?

Yep!

> >                         to_kvm_hv(kvm)->ext_call = cap->args[0];
> >                         r = 0;
> >                 }
> >                 mutex_unlock(&kvm->lock);
> >
> > kvm_hv_hypercall()
> >
> >
> >         case HV_EXT_CALL_QUERY_CAPABILITIES ... HV_EXT_CALL_MAX:
> >                 if (unlikely(hc.fast)) {
> >                         ret = HV_STATUS_INVALID_PARAMETER;
> >                         break;
> >                 }
> >                 if (!(hc.code & to_kvm_hv(vcpu->kvm)->ext_call))
> 
> It won't be directly this. There will be a mapping of hc.code to the
> corresponding bit and then "&" with ext_call.
> 
> 
> >                         goto hypercall_userspace_exit;
> >
> >                 ret = kvm_hv_ext_hypercall(...)
> >                 break;
> >
> >
> > That maintains backwards compatibility with "exit on everything" as userspace
> > still needs to opt-in to having KVM handle specific hypercalls in-kernel, and it
> > provides the necessary knob for userspace to tell KVM which hypercalls should be
> > allowed, i.e. ensures KVM doesn't violate HV_EXT_CALL_QUERY_CAPABILITIES.
> 
> So, should I send a version with KVM capability similar to above

No, the above was just a sketch of how we can extend support if necessary.  In
general, we try to avoid creating _any_ APIs before they are strictly required.
For uAPIs, that's pretty much a hard rule.

> or for now just send the version which by default exit to userspace and later
> whenever the need arises KVM capability can be added then?

This one please :-)
Vipin Sharma Oct. 24, 2022, 8:24 p.m. UTC | #8
On Mon, Oct 24, 2022 at 12:36 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Oct 24, 2022, Vipin Sharma wrote:
> > On Mon, Oct 24, 2022 at 8:22 AM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, Oct 24, 2022, Vitaly Kuznetsov wrote:
> > > enable():
> > >
> > >         case KVM_CAP_HYPERV_EXT_CALL:
> > >                 r = -EINVAL;
> > >                 if (mask & ~KVM_SUPPORTED_HYPERV_EXT_CALL)
> > >                         break;
> > >
> > >                 mutex_lock(&kvm->lock);
> > >                 if (!kvm->created_vcpus) {
> >
> > Any reason for setting capability only after vcpus are created?
>
> This only allows setting the capability _before_ vCPUs are created.  Attempting
> to set the cap after vCPUs are created gets rejected with -EINVAL.  This
> requirement means vCPUs don't need to take a lock to consume per-VM state, as KVM
> prevents the state from changing once vCPUs are created.

I totally misread the condition and didn't notice the '!' in the if()
statement.

Thanks for the feedback.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 3089ec352743..421279a61a9a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -158,6 +158,9 @@ 
 #define HV_SHARED_GPA_BOUNDARY_ACTIVE			BIT(5)
 #define HV_SHARED_GPA_BOUNDARY_BITS			GENMASK(11, 6)
 
+/* Extended hypercalls supported by KVM */
+#define HV_EXT_CALL_QUERY_CAPABILITIES_MASK		BIT(0)
+
 enum hv_isolation_type {
 	HV_ISOLATION_TYPE_NONE	= 0,
 	HV_ISOLATION_TYPE_VBS	= 1,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7551b6f9c31c..b1892ea39a23 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1041,6 +1041,8 @@  struct kvm_hv {
 
 	struct hv_partition_assist_pg *hv_pa_pg;
 	struct kvm_hv_syndbg hv_syndbg;
+
+	u64 extended_hypercalls_cap;
 };
 
 struct msr_bitmap_range {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0adf4a437e85..5f0b7d8789a8 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -32,6 +32,7 @@ 
 #include <linux/eventfd.h>
 
 #include <asm/apicdef.h>
+#include <asm/hyperv-tlfs.h>
 #include <trace/events/kvm.h>
 
 #include "trace.h"
@@ -2140,6 +2141,8 @@  static void kvm_hv_hypercall_read_xmm(struct kvm_hv_hcall *hc)
 
 static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
 {
+	struct kvm_hv *hv;
+
 	if (!hv_vcpu->enforce_cpuid)
 		return true;
 
@@ -2178,6 +2181,14 @@  static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
 	case HVCALL_SEND_IPI:
 		return hv_vcpu->cpuid_cache.enlightenments_eax &
 			HV_X64_CLUSTER_IPI_RECOMMENDED;
+	case HV_EXT_CALL_QUERY_CAPABILITIES:
+		return hv_vcpu->cpuid_cache.features_ebx &
+				HV_ENABLE_EXTENDED_HYPERCALLS;
+	case HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY:
+		hv = to_kvm_hv(hv_vcpu->vcpu->kvm);
+		return hv->extended_hypercalls_cap &
+			HV_EXT_CAPABILITY_GET_BOOT_ZEROED_MEMORY;
+		break;
 	default:
 		break;
 	}
@@ -2189,6 +2200,7 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	struct kvm_hv_hcall hc;
+	struct kvm_hv *hv;
 	u64 ret = HV_STATUS_SUCCESS;
 
 	/*
@@ -2345,6 +2357,30 @@  int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 				kvm_hv_hypercall_complete_userspace;
 		return 0;
 	}
+	case HV_EXT_CALL_QUERY_CAPABILITIES:
+		if (unlikely(hc.fast)) {
+			ret = HV_STATUS_INVALID_PARAMETER;
+			break;
+		}
+		hv = to_kvm_hv(hv_vcpu->vcpu->kvm);
+		if (kvm_vcpu_write_guest(vcpu, hc.outgpa,
+					 &hv->extended_hypercalls_cap,
+					 sizeof(hv->extended_hypercalls_cap)))
+			ret = HV_STATUS_OPERATION_DENIED;
+		break;
+	case HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY:
+		if (unlikely(hc.fast)) {
+			ret = HV_STATUS_INVALID_PARAMETER;
+			break;
+		}
+		vcpu->run->exit_reason = KVM_EXIT_HYPERV;
+		vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL;
+		vcpu->run->hyperv.u.hcall.input = hc.param;
+		vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
+		vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
+		vcpu->arch.complete_userspace_io =
+				kvm_hv_hypercall_complete_userspace;
+		return 0;
 	default:
 		ret = HV_STATUS_INVALID_HYPERCALL_CODE;
 		break;
@@ -2494,6 +2530,7 @@  int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 
 			ent->ebx |= HV_POST_MESSAGES;
 			ent->ebx |= HV_SIGNAL_EVENTS;
+			ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
 
 			ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
 			ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
@@ -2578,3 +2615,21 @@  int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 
 	return 0;
 }
+
+int kvm_hv_set_ext_call_cap(struct kvm_vcpu *vcpu, uint64_t cap)
+{
+	struct kvm_hv *hv;
+	int r;
+
+	if (cap & ~HV_EXT_CALL_QUERY_CAPABILITIES_MASK)
+		return -EINVAL;
+
+	r = kvm_hv_vcpu_init(vcpu);
+	if (r)
+		return r;
+
+	hv = to_kvm_hv(vcpu->kvm);
+	hv->extended_hypercalls_cap = cap;
+
+	return 0;
+}
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 1030b1b50552..b92f8abdbf0d 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -150,5 +150,6 @@  int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce);
 int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
 int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 		     struct kvm_cpuid_entry2 __user *entries);
+int kvm_hv_set_ext_call_cap(struct kvm_vcpu *vcpu, uint64_t cap);
 
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4bd5f8a751de..caca1f537f6d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4515,6 +4515,9 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_NOTIFY_VMEXIT:
 		r = kvm_caps.has_notify_vmexit;
 		break;
+	case KVM_CAP_HYPERV_EXT_CALL_QUERY:
+		r = HV_EXT_CALL_QUERY_CAPABILITIES_MASK;
+		break;
 	default:
 		break;
 	}
@@ -5510,6 +5513,8 @@  static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 			kvm_update_pv_runtime(vcpu);
 
 		return 0;
+	case KVM_CAP_HYPERV_EXT_CALL_QUERY:
+		return kvm_hv_set_ext_call_cap(vcpu, cap->args[0]);
 	default:
 		return -EINVAL;
 	}
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..15ffc2c5d950 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -162,6 +162,7 @@  struct ms_hyperv_tsc_page {
 
 /* Extended hypercalls */
 #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
+#define HV_EXT_CALL_GET_BOOT_ZEROED_MEMORY	0x8002
 #define HV_EXT_CALL_MEMORY_HEAT_HINT		0x8003
 
 #define HV_FLUSH_ALL_PROCESSORS			BIT(0)
@@ -170,7 +171,8 @@  struct ms_hyperv_tsc_page {
 #define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT	BIT(3)
 
 /* Extended capability bits */
-#define HV_EXT_CAPABILITY_MEMORY_COLD_DISCARD_HINT BIT(8)
+#define HV_EXT_CAPABILITY_GET_BOOT_ZEROED_MEMORY	BIT(0)
+#define HV_EXT_CAPABILITY_MEMORY_COLD_DISCARD_HINT	BIT(8)
 
 enum HV_GENERIC_SET_FORMAT {
 	HV_GENERIC_SET_SPARSE_4K,
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..42860137e545 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_HYPERV_EXT_CALL_QUERY 224
 
 #ifdef KVM_CAP_IRQ_ROUTING