diff mbox series

[v7,23/27] KVM: arm64/sve: Add pseudo-register for the guest's vector lengths

Message ID 1553864452-15080-24-git-send-email-Dave.Martin@arm.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: SVE guest support | expand

Commit Message

Dave Martin March 29, 2019, 1 p.m. UTC
This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
allow userspace to set and query the set of vector lengths visible
to the guest.

In the future, multiple register slices per SVE register may be
visible through the ioctl interface.  Once the set of slices has
been determined we would not be able to allow the vector length set
to be changed any more, in order to avoid userspace seeing
inconsistent sets of registers.  For this reason, this patch adds
support for explicit finalization of the SVE configuration via the
KVM_ARM_VCPU_FINALIZE ioctl.

Finalization is the proper place to allocate the SVE register state
storage in vcpu->arch.sve_state, so this patch adds that as
appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
was previously a no-op on arm64.

To simplify the logic for determining what vector lengths can be
supported, some code is added to KVM init to work this out, in the
kvm_arm_init_arch_resources() hook.

The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
Subsequent patches will allow SVE to be turned on for guest vcpus,
making it visible.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>

---

Changes since v5:

 * [Julien Thierry] Delete overzealous BUILD_BUG_ON() checks.
   It also turns out that these could cause kernel build failures in
   some configurations, even though the checked condition is compile-
   time constant.

   Because of the way the affected functions are called, the checks
   are superfluous, so the simplest option is simply to get rid of
   them.

 * [Julien Thierry] Free vcpu->arch.sve_state (if any) in
   kvm_arch_vcpu_uninit() (which is currently a no-op).

   This was accidentally lost during a previous rebase.

 * Add kvm_arm_init_arch_resources() hook, and use it to probe SVE
   configuration for KVM, to avoid duplicating the logic elsewhere.
   We only need to do this once.

 * Move sve_state buffer allocation to kvm_arm_vcpu_finalize().

   As well as making the code more straightforward, this avoids the
   need to allocate memory in kvm_reset_vcpu(), the meat of which is
   non-preemptible since commit 358b28f09f0a ("arm/arm64: KVM: Allow a
   VCPU to fully reset itself").

   The refactoring means that if this has not been done by the time
   we hit KVM_RUN, then this allocation will happen on the
   kvm_arm_first_run_init() path, where preemption remains enabled.

 * Add a couple of comments in {get,set}_sve_reg() to make the handling
   of the KVM_REG_ARM64_SVE_VLS special case a little clearer.

 * Move mis-split rework to avoid put_user() being the correct size
   by accident in KVM_GET_REG_LIST to KVM: arm64: Enumerate SVE register
   indices for KVM_GET_REG_LIST.

 * Fix wrong WARN_ON() check sense when checking whether the
   implementation may needs move SVE register slices than KVM can
   support.

 * Fix erroneous setting of vcpu->arch.sve_max_vl based on stale loop
   control veriable vq.

 * Move definition of KVM_ARM_VCPU_SVE from KVM: arm64/sve: Allow
   userspace to enable SVE for vcpus.

 * Migrate to explicit finalization of the SVE configuration, using
   KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
---
 arch/arm64/include/asm/kvm_host.h |  15 +++--
 arch/arm64/include/uapi/asm/kvm.h |   5 ++
 arch/arm64/kvm/guest.c            | 114 +++++++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/reset.c            |  89 +++++++++++++++++++++++++++++
 4 files changed, 215 insertions(+), 8 deletions(-)

Comments

Andrew Jones April 4, 2019, 8:18 p.m. UTC | #1
On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> allow userspace to set and query the set of vector lengths visible
> to the guest.
> 
> In the future, multiple register slices per SVE register may be
> visible through the ioctl interface.  Once the set of slices has
> been determined we would not be able to allow the vector length set
> to be changed any more, in order to avoid userspace seeing
> inconsistent sets of registers.  For this reason, this patch adds
> support for explicit finalization of the SVE configuration via the
> KVM_ARM_VCPU_FINALIZE ioctl.
> 
> Finalization is the proper place to allocate the SVE register state
> storage in vcpu->arch.sve_state, so this patch adds that as
> appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> was previously a no-op on arm64.
> 
> To simplify the logic for determining what vector lengths can be
> supported, some code is added to KVM init to work this out, in the
> kvm_arm_init_arch_resources() hook.
> 
> The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> Subsequent patches will allow SVE to be turned on for guest vcpus,
> making it visible.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] Delete overzealous BUILD_BUG_ON() checks.
>    It also turns out that these could cause kernel build failures in
>    some configurations, even though the checked condition is compile-
>    time constant.
> 
>    Because of the way the affected functions are called, the checks
>    are superfluous, so the simplest option is simply to get rid of
>    them.
> 
>  * [Julien Thierry] Free vcpu->arch.sve_state (if any) in
>    kvm_arch_vcpu_uninit() (which is currently a no-op).
> 
>    This was accidentally lost during a previous rebase.
> 
>  * Add kvm_arm_init_arch_resources() hook, and use it to probe SVE
>    configuration for KVM, to avoid duplicating the logic elsewhere.
>    We only need to do this once.
> 
>  * Move sve_state buffer allocation to kvm_arm_vcpu_finalize().
> 
>    As well as making the code more straightforward, this avoids the
>    need to allocate memory in kvm_reset_vcpu(), the meat of which is
>    non-preemptible since commit 358b28f09f0a ("arm/arm64: KVM: Allow a
>    VCPU to fully reset itself").
> 
>    The refactoring means that if this has not been done by the time
>    we hit KVM_RUN, then this allocation will happen on the
>    kvm_arm_first_run_init() path, where preemption remains enabled.
> 
>  * Add a couple of comments in {get,set}_sve_reg() to make the handling
>    of the KVM_REG_ARM64_SVE_VLS special case a little clearer.
> 
>  * Move mis-split rework to avoid put_user() being the correct size
>    by accident in KVM_GET_REG_LIST to KVM: arm64: Enumerate SVE register
>    indices for KVM_GET_REG_LIST.
> 
>  * Fix wrong WARN_ON() check sense when checking whether the
>    implementation may needs move SVE register slices than KVM can
>    support.
> 
>  * Fix erroneous setting of vcpu->arch.sve_max_vl based on stale loop
>    control veriable vq.
> 
>  * Move definition of KVM_ARM_VCPU_SVE from KVM: arm64/sve: Allow
>    userspace to enable SVE for vcpus.
> 
>  * Migrate to explicit finalization of the SVE configuration, using
>    KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
> ---
>  arch/arm64/include/asm/kvm_host.h |  15 +++--
>  arch/arm64/include/uapi/asm/kvm.h |   5 ++
>  arch/arm64/kvm/guest.c            | 114 +++++++++++++++++++++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |  89 +++++++++++++++++++++++++++++
>  4 files changed, 215 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 98658f7..5475cc4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -23,7 +23,6 @@
>  #define __ARM64_KVM_HOST_H__
>  
>  #include <linux/bitmap.h>
> -#include <linux/errno.h>
>  #include <linux/types.h>
>  #include <linux/jump_label.h>
>  #include <linux/kvm_types.h>
> @@ -50,6 +49,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> +/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
>  #define KVM_VCPU_MAX_FEATURES 4
>  
>  #define KVM_REQ_SLEEP \
> @@ -59,10 +59,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> -static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +extern unsigned int kvm_sve_max_vl;
> +int kvm_arm_init_arch_resources(void);
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>  
> @@ -353,6 +355,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
>  #define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
>  #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
> +#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
>  
>  #define vcpu_has_sve(vcpu) (system_supports_sve() && \
>  			    ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
> @@ -525,7 +528,6 @@ static inline bool kvm_arch_requires_vhe(void)
>  
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
> @@ -626,7 +628,10 @@ void kvm_arch_free_vm(struct kvm *kvm);
>  
>  int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
>  
> -#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> -#define kvm_arm_vcpu_is_finalized(vcpu) true
> +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what);
> +bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> +
> +#define kvm_arm_vcpu_sve_finalized(vcpu) \
> +	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ced760c..6963b7e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -102,6 +102,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
>  #define KVM_ARM_VCPU_PSCI_0_2		2 /* CPU uses PSCI v0.2 */
>  #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
> +#define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> @@ -243,6 +244,10 @@ struct kvm_vcpu_events {
>  					 ((n) << 5) | (i))
>  #define KVM_REG_ARM64_SVE_FFR(i)	KVM_REG_ARM64_SVE_PREG(16, i)
>  
> +/* Vector lengths pseudo-register: */
> +#define KVM_REG_ARM64_SVE_VLS		(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
> +					 KVM_REG_SIZE_U512 | 0xffff)
> +
>  /* Device Control API: ARM VGIC */
>  #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 2aa80a5..086ab05 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	return err;
>  }
>  
> +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> +
> +static bool vq_present(
> +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> +	unsigned int vq)
> +{
> +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> +}
> +
> +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +	unsigned int max_vq, vq;
> +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> +
> +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> +		return -EINVAL;
> +
> +	memset(vqs, 0, sizeof(vqs));
> +
> +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> +		if (sve_vq_available(vq))
> +			vqs[vq_word(vq)] |= vq_mask(vq);
> +
> +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +	unsigned int max_vq, vq;
> +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];

There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'.
A macro is probably warranted.

> +
> +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM; /* too late! */
> +
> +	if (WARN_ON(vcpu->arch.sve_state))
> +		return -EINVAL;
> +
> +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> +		return -EFAULT;
> +
> +	max_vq = 0;
> +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> +		if (vq_present(&vqs, vq))
> +			max_vq = vq;
> +
> +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> +		return -EINVAL;
> +
> +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> +			return -EINVAL;

What about supporting the optional non-power-of-2 vector lengths? For
example, if the maximum VL is 512, then in addition to 512, 128, and
256 there could be a 384. If we plan to start a guest on a machine
that has all four and then migrate it to a machine that only has
128,256,512 we would want to set the VL set to 128,256,512, even on
the machine that supports 384. If I understand the above test correctly,
then that wouldn't work.

> +
> +	/* Can't run with no vector lengths at all: */
> +	if (max_vq < SVE_VQ_MIN)
> +		return -EINVAL;
> +
> +	/* vcpu->arch.sve_state will be alloc'd by kvm_vcpu_finalize_sve() */
> +	vcpu->arch.sve_max_vl = sve_vl_from_vq(max_vq);
> +
> +	return 0;
> +}
> +
>  #define SVE_REG_SLICE_SHIFT	0
>  #define SVE_REG_SLICE_BITS	5
>  #define SVE_REG_ID_SHIFT	(SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS)
> @@ -296,7 +363,19 @@ static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	struct sve_state_reg_region region;
>  	char __user *uptr = (char __user *)reg->addr;
>  
> -	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
> +	if (!vcpu_has_sve(vcpu))
> +		return -ENOENT;
> +
> +	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
> +	if (reg->id == KVM_REG_ARM64_SVE_VLS)
> +		return get_sve_vls(vcpu, reg);
> +
> +	/* Otherwise, reg is an architectural SVE register... */
> +
> +	if (!kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM;
> +
> +	if (sve_reg_to_region(&region, vcpu, reg))
>  		return -ENOENT;
>  
>  	if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset,
> @@ -312,7 +391,19 @@ static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	struct sve_state_reg_region region;
>  	const char __user *uptr = (const char __user *)reg->addr;
>  
> -	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
> +	if (!vcpu_has_sve(vcpu))
> +		return -ENOENT;
> +
> +	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
> +	if (reg->id == KVM_REG_ARM64_SVE_VLS)
> +		return set_sve_vls(vcpu, reg);
> +
> +	/* Otherwise, reg is an architectural SVE register... */
> +
> +	if (!kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM;
> +
> +	if (sve_reg_to_region(&region, vcpu, reg))
>  		return -ENOENT;
>  
>  	if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr,
> @@ -426,7 +517,11 @@ static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
>  	if (!vcpu_has_sve(vcpu))
>  		return 0;
>  
> -	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */);
> +	/* Policed by KVM_GET_REG_LIST: */
> +	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
> +
> +	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
> +		+ 1; /* KVM_REG_ARM64_SVE_VLS */
>  }
>  
>  static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
> @@ -441,6 +536,19 @@ static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
>  	if (!vcpu_has_sve(vcpu))
>  		return 0;
>  
> +	/* Policed by KVM_GET_REG_LIST: */
> +	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
> +
> +	/*
> +	 * Enumerate this first, so that userspace can save/restore in
> +	 * the order reported by KVM_GET_REG_LIST:
> +	 */
> +	reg = KVM_REG_ARM64_SVE_VLS;
> +	if (put_user(reg, uindices++))
> +		return -EFAULT;
> +
> +	++num_regs;
> +
>  	for (i = 0; i < slices; i++) {
>  		for (n = 0; n < SVE_NUM_ZREGS; n++) {
>  			reg = KVM_REG_ARM64_SVE_ZREG(n, i);
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index f16a5f8..e7f9c06 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -23,11 +23,14 @@
>  #include <linux/kvm_host.h>
>  #include <linux/kvm.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
>  
>  #include <kvm/arm_arch_timer.h>
>  
>  #include <asm/cpufeature.h>
>  #include <asm/cputype.h>
> +#include <asm/fpsimd.h>
>  #include <asm/ptrace.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -99,6 +102,92 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	return r;
>  }
>  
> +unsigned int kvm_sve_max_vl;
> +
> +int kvm_arm_init_arch_resources(void)
> +{
> +	if (system_supports_sve()) {
> +		kvm_sve_max_vl = sve_max_virtualisable_vl;
> +
> +		/*
> +		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
> +		 * to be extended with multiple register slice support in
> +		 * order to support vector lengths greater than
> +		 * SVE_VL_ARCH_MAX:
> +		 */
> +		if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
> +			kvm_sve_max_vl = SVE_VL_ARCH_MAX;
> +
> +		/*
> +		 * Don't even try to make use of vector lengths that
> +		 * aren't available on all CPUs, for now:
> +		 */
> +		if (kvm_sve_max_vl < sve_max_vl)
> +			pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
> +				kvm_sve_max_vl);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Finalize vcpu's maximum SVE vector length, allocating
> + * vcpu->arch.sve_state as necessary.
> + */
> +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> +{
> +	void *buf;
> +	unsigned int vl;
> +
> +	vl = vcpu->arch.sve_max_vl;
> +
> +	/*
> +	 * Resposibility for these properties is shared between
> +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> +	 * set_sve_vls().  Double-check here just to be sure:
> +	 */
> +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> +		    vl > SVE_VL_ARCH_MAX))
> +		return -EIO;
> +
> +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	vcpu->arch.sve_state = buf;
> +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> +	return 0;
> +}
> +
> +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> +{
> +	switch (what) {
> +	case KVM_ARM_VCPU_SVE:
> +		if (!vcpu_has_sve(vcpu))
> +			return -EINVAL;
> +
> +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> +			return -EPERM;
> +
> +		return kvm_vcpu_finalize_sve(vcpu);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
> +		return false;
> +
> +	return true;
> +}
> +
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +	kfree(vcpu->arch.sve_state);
> +}
> +
>  /**
>   * kvm_reset_vcpu - sets core registers and sys_regs to reset value
>   * @vcpu: The VCPU pointer
> -- 
> 2.1.4
>

Thanks,
drew
Andrew Jones April 4, 2019, 8:31 p.m. UTC | #2
On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> allow userspace to set and query the set of vector lengths visible
> to the guest.
> 
> In the future, multiple register slices per SVE register may be
> visible through the ioctl interface.  Once the set of slices has
> been determined we would not be able to allow the vector length set
> to be changed any more, in order to avoid userspace seeing
> inconsistent sets of registers.  For this reason, this patch adds
> support for explicit finalization of the SVE configuration via the
> KVM_ARM_VCPU_FINALIZE ioctl.
> 
> Finalization is the proper place to allocate the SVE register state
> storage in vcpu->arch.sve_state, so this patch adds that as
> appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> was previously a no-op on arm64.
> 
> To simplify the logic for determining what vector lengths can be
> supported, some code is added to KVM init to work this out, in the
> kvm_arm_init_arch_resources() hook.
> 
> The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> Subsequent patches will allow SVE to be turned on for guest vcpus,
> making it visible.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] Delete overzealous BUILD_BUG_ON() checks.
>    It also turns out that these could cause kernel build failures in
>    some configurations, even though the checked condition is compile-
>    time constant.
> 
>    Because of the way the affected functions are called, the checks
>    are superfluous, so the simplest option is simply to get rid of
>    them.
> 
>  * [Julien Thierry] Free vcpu->arch.sve_state (if any) in
>    kvm_arch_vcpu_uninit() (which is currently a no-op).
> 
>    This was accidentally lost during a previous rebase.
> 
>  * Add kvm_arm_init_arch_resources() hook, and use it to probe SVE
>    configuration for KVM, to avoid duplicating the logic elsewhere.
>    We only need to do this once.
> 
>  * Move sve_state buffer allocation to kvm_arm_vcpu_finalize().
> 
>    As well as making the code more straightforward, this avoids the
>    need to allocate memory in kvm_reset_vcpu(), the meat of which is
>    non-preemptible since commit 358b28f09f0a ("arm/arm64: KVM: Allow a
>    VCPU to fully reset itself").
> 
>    The refactoring means that if this has not been done by the time
>    we hit KVM_RUN, then this allocation will happen on the
>    kvm_arm_first_run_init() path, where preemption remains enabled.
> 
>  * Add a couple of comments in {get,set}_sve_reg() to make the handling
>    of the KVM_REG_ARM64_SVE_VLS special case a little clearer.
> 
>  * Move mis-split rework to avoid put_user() being the correct size
>    by accident in KVM_GET_REG_LIST to KVM: arm64: Enumerate SVE register
>    indices for KVM_GET_REG_LIST.
> 
>  * Fix wrong WARN_ON() check sense when checking whether the
>    implementation may needs move SVE register slices than KVM can
>    support.
> 
>  * Fix erroneous setting of vcpu->arch.sve_max_vl based on stale loop
>    control veriable vq.
> 
>  * Move definition of KVM_ARM_VCPU_SVE from KVM: arm64/sve: Allow
>    userspace to enable SVE for vcpus.
> 
>  * Migrate to explicit finalization of the SVE configuration, using
>    KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
> ---
>  arch/arm64/include/asm/kvm_host.h |  15 +++--
>  arch/arm64/include/uapi/asm/kvm.h |   5 ++
>  arch/arm64/kvm/guest.c            | 114 +++++++++++++++++++++++++++++++++++++-
>  arch/arm64/kvm/reset.c            |  89 +++++++++++++++++++++++++++++
>  4 files changed, 215 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 98658f7..5475cc4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -23,7 +23,6 @@
>  #define __ARM64_KVM_HOST_H__
>  
>  #include <linux/bitmap.h>
> -#include <linux/errno.h>
>  #include <linux/types.h>
>  #include <linux/jump_label.h>
>  #include <linux/kvm_types.h>
> @@ -50,6 +49,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> +/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
>  #define KVM_VCPU_MAX_FEATURES 4
>  
>  #define KVM_REQ_SLEEP \
> @@ -59,10 +59,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> -static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +extern unsigned int kvm_sve_max_vl;
> +int kvm_arm_init_arch_resources(void);
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
>  
> @@ -353,6 +355,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
>  #define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
>  #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
> +#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
>  
>  #define vcpu_has_sve(vcpu) (system_supports_sve() && \
>  			    ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
> @@ -525,7 +528,6 @@ static inline bool kvm_arch_requires_vhe(void)
>  
>  static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> -static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>  
> @@ -626,7 +628,10 @@ void kvm_arch_free_vm(struct kvm *kvm);
>  
>  int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
>  
> -#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> -#define kvm_arm_vcpu_is_finalized(vcpu) true
> +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what);
> +bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> +
> +#define kvm_arm_vcpu_sve_finalized(vcpu) \
> +	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index ced760c..6963b7e 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -102,6 +102,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
>  #define KVM_ARM_VCPU_PSCI_0_2		2 /* CPU uses PSCI v0.2 */
>  #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
> +#define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> @@ -243,6 +244,10 @@ struct kvm_vcpu_events {
>  					 ((n) << 5) | (i))
>  #define KVM_REG_ARM64_SVE_FFR(i)	KVM_REG_ARM64_SVE_PREG(16, i)
>  
> +/* Vector lengths pseudo-register: */
> +#define KVM_REG_ARM64_SVE_VLS		(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
> +					 KVM_REG_SIZE_U512 | 0xffff)
> +
>  /* Device Control API: ARM VGIC */
>  #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 2aa80a5..086ab05 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	return err;
>  }
>  
> +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> +
> +static bool vq_present(
> +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> +	unsigned int vq)
> +{
> +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> +}
> +
> +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +	unsigned int max_vq, vq;
> +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> +
> +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> +		return -EINVAL;
> +
> +	memset(vqs, 0, sizeof(vqs));
> +
> +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> +		if (sve_vq_available(vq))
> +			vqs[vq_word(vq)] |= vq_mask(vq);
> +
> +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +	unsigned int max_vq, vq;
> +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> +
> +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM; /* too late! */
> +
> +	if (WARN_ON(vcpu->arch.sve_state))
> +		return -EINVAL;
> +
> +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> +		return -EFAULT;
> +
> +	max_vq = 0;
> +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> +		if (vq_present(&vqs, vq))
> +			max_vq = vq;
> +
> +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> +		return -EINVAL;
> +
> +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> +			return -EINVAL;
> +
> +	/* Can't run with no vector lengths at all: */
> +	if (max_vq < SVE_VQ_MIN)
> +		return -EINVAL;
> +
> +	/* vcpu->arch.sve_state will be alloc'd by kvm_vcpu_finalize_sve() */
> +	vcpu->arch.sve_max_vl = sve_vl_from_vq(max_vq);
> +
> +	return 0;
> +}
> +
>  #define SVE_REG_SLICE_SHIFT	0
>  #define SVE_REG_SLICE_BITS	5
>  #define SVE_REG_ID_SHIFT	(SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS)
> @@ -296,7 +363,19 @@ static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	struct sve_state_reg_region region;
>  	char __user *uptr = (char __user *)reg->addr;
>  
> -	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
> +	if (!vcpu_has_sve(vcpu))
> +		return -ENOENT;
> +
> +	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
> +	if (reg->id == KVM_REG_ARM64_SVE_VLS)
> +		return get_sve_vls(vcpu, reg);
> +
> +	/* Otherwise, reg is an architectural SVE register... */
> +
> +	if (!kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM;
> +
> +	if (sve_reg_to_region(&region, vcpu, reg))
>  		return -ENOENT;
>  
>  	if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset,
> @@ -312,7 +391,19 @@ static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	struct sve_state_reg_region region;
>  	const char __user *uptr = (const char __user *)reg->addr;
>  
> -	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
> +	if (!vcpu_has_sve(vcpu))
> +		return -ENOENT;
> +
> +	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
> +	if (reg->id == KVM_REG_ARM64_SVE_VLS)
> +		return set_sve_vls(vcpu, reg);
> +
> +	/* Otherwise, reg is an architectural SVE register... */
> +
> +	if (!kvm_arm_vcpu_sve_finalized(vcpu))
> +		return -EPERM;
> +
> +	if (sve_reg_to_region(&region, vcpu, reg))
>  		return -ENOENT;
>  
>  	if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr,
> @@ -426,7 +517,11 @@ static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
>  	if (!vcpu_has_sve(vcpu))
>  		return 0;
>  
> -	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */);
> +	/* Policed by KVM_GET_REG_LIST: */
> +	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
> +
> +	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
> +		+ 1; /* KVM_REG_ARM64_SVE_VLS */
>  }
>  
>  static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
> @@ -441,6 +536,19 @@ static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
>  	if (!vcpu_has_sve(vcpu))
>  		return 0;
>  
> +	/* Policed by KVM_GET_REG_LIST: */
> +	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
> +
> +	/*
> +	 * Enumerate this first, so that userspace can save/restore in
> +	 * the order reported by KVM_GET_REG_LIST:
> +	 */
> +	reg = KVM_REG_ARM64_SVE_VLS;
> +	if (put_user(reg, uindices++))
> +		return -EFAULT;
> +
> +	++num_regs;
> +
>  	for (i = 0; i < slices; i++) {
>  		for (n = 0; n < SVE_NUM_ZREGS; n++) {
>  			reg = KVM_REG_ARM64_SVE_ZREG(n, i);
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index f16a5f8..e7f9c06 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -23,11 +23,14 @@
>  #include <linux/kvm_host.h>
>  #include <linux/kvm.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
>  
>  #include <kvm/arm_arch_timer.h>
>  
>  #include <asm/cpufeature.h>
>  #include <asm/cputype.h>
> +#include <asm/fpsimd.h>
>  #include <asm/ptrace.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> @@ -99,6 +102,92 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	return r;
>  }
>  
> +unsigned int kvm_sve_max_vl;
> +
> +int kvm_arm_init_arch_resources(void)
> +{
> +	if (system_supports_sve()) {
> +		kvm_sve_max_vl = sve_max_virtualisable_vl;
> +
> +		/*
> +		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
> +		 * to be extended with multiple register slice support in
> +		 * order to support vector lengths greater than
> +		 * SVE_VL_ARCH_MAX:
> +		 */
> +		if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
> +			kvm_sve_max_vl = SVE_VL_ARCH_MAX;
> +
> +		/*
> +		 * Don't even try to make use of vector lengths that
> +		 * aren't available on all CPUs, for now:
> +		 */
> +		if (kvm_sve_max_vl < sve_max_vl)
> +			pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
> +				kvm_sve_max_vl);
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Finalize vcpu's maximum SVE vector length, allocating
> + * vcpu->arch.sve_state as necessary.
> + */
> +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> +{
> +	void *buf;
> +	unsigned int vl;
> +
> +	vl = vcpu->arch.sve_max_vl;
> +
> +	/*
> +	 * Resposibility for these properties is shared between
> +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> +	 * set_sve_vls().  Double-check here just to be sure:
> +	 */
> +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> +		    vl > SVE_VL_ARCH_MAX))
> +		return -EIO;
> +
> +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	vcpu->arch.sve_state = buf;
> +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> +	return 0;
> +}
> +
> +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> +{
> +	switch (what) {
> +	case KVM_ARM_VCPU_SVE:
> +		if (!vcpu_has_sve(vcpu))
> +			return -EINVAL;
> +
> +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> +			return -EPERM;
> +
> +		return kvm_vcpu_finalize_sve(vcpu);

In the next patch I see that we already have KVM_ARM64_GUEST_HAS_SVE
set in vcpu->arch.flags at this point. So if this kvm_vcpu_finalize_sve()
call fails, shouldn't we unset KVM_ARM64_GUEST_HAS_SVE and anything
else used to indicate the vcpu has sve?

Thanks,
drew


> +	}
> +
> +	return -EINVAL;
> +}
> +
> +bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
> +		return false;
> +
> +	return true;
> +}
> +
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> +	kfree(vcpu->arch.sve_state);
> +}
> +
>  /**
>   * kvm_reset_vcpu - sets core registers and sys_regs to reset value
>   * @vcpu: The VCPU pointer
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Dave Martin April 5, 2019, 9:36 a.m. UTC | #3
On Thu, Apr 04, 2019 at 10:18:54PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > allow userspace to set and query the set of vector lengths visible
> > to the guest.
> > 
> > In the future, multiple register slices per SVE register may be
> > visible through the ioctl interface.  Once the set of slices has
> > been determined we would not be able to allow the vector length set
> > to be changed any more, in order to avoid userspace seeing
> > inconsistent sets of registers.  For this reason, this patch adds
> > support for explicit finalization of the SVE configuration via the
> > KVM_ARM_VCPU_FINALIZE ioctl.
> > 
> > Finalization is the proper place to allocate the SVE register state
> > storage in vcpu->arch.sve_state, so this patch adds that as
> > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > was previously a no-op on arm64.
> > 
> > To simplify the logic for determining what vector lengths can be
> > supported, some code is added to KVM init to work this out, in the
> > kvm_arm_init_arch_resources() hook.
> > 
> > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > making it visible.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > 
> > ---

[...]

> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 2aa80a5..086ab05 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >  	return err;
> >  }
> >  
> > +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> > +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> > +
> > +static bool vq_present(
> > +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> > +	unsigned int vq)
> > +{
> > +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> > +}
> > +
> > +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > +{
> > +	unsigned int max_vq, vq;
> > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > +
> > +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> > +		return -EINVAL;
> > +
> > +	memset(vqs, 0, sizeof(vqs));
> > +
> > +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > +		if (sve_vq_available(vq))
> > +			vqs[vq_word(vq)] |= vq_mask(vq);
> > +
> > +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> > +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > +{
> > +	unsigned int max_vq, vq;
> > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> 
> There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'.
> A macro is probably warranted.

Fair point.  These are a bit cumbersome.  How about:

#define SVE_VLS_WORDS DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)

Annoyingly, this is virtually identical to a Linux bitmap: the base type
is u64 instead of unsigned long; otherwise there's no intentional
difference.

(Aside: do you know why the KVM API insists on everything being u64?
This makes sense for non-native types (like guest registers), but not
for native things like host userspace addresses etc.)

> > +
> > +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> > +		return -EPERM; /* too late! */
> > +
> > +	if (WARN_ON(vcpu->arch.sve_state))
> > +		return -EINVAL;
> > +
> > +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> > +		return -EFAULT;
> > +
> > +	max_vq = 0;
> > +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> > +		if (vq_present(&vqs, vq))
> > +			max_vq = vq;
> > +
> > +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> > +		return -EINVAL;
> > +
> > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> > +			return -EINVAL;
> 
> What about supporting the optional non-power-of-2 vector lengths? For
> example, if the maximum VL is 512, then in addition to 512, 128, and
> 256 there could be a 384. If we plan to start a guest on a machine
> that has all four and then migrate it to a machine that only has
> 128,256,512 we would want to set the VL set to 128,256,512, even on
> the machine that supports 384. If I understand the above test correctly,
> then that wouldn't work.

This is exactly what the code is trying to forbid, though it may not be
obvious here why.

The trouble is, we can't correctly emulate a vcpu supporting
{128,256,512} on hardware that also supports 384-bit vectors: the
architecture says that the vector length you get is determined by
rounding ZCR_EL1.LEN down to the next vector length actually
supported.

So, the guest would expect writing ZCR_EL1.LEN = 2 to give a vector
length of 256 bits, whereas on this hardware the actual resulting
vector length would be 384 bits.

sve_probe_vqs() relies on this property to detect the set of available
vector lengths.  Simple, bare-metal guests might also just set
ZCR_ELx.LEN = 0x1ff to just get the max available.


The architecture says categorically that the set of vector lengths
supported is a fixed property of the (emulated) hardware -- we can't
having this changing under the guest's feet.

Fixing this would require an archietctural way to filter out individual
vector lengths from the supported set, not just a way to clamp the
maximum (which is all ZCR_EL2.LEN gives us).

The general expectation is that sanely designed cluster will be
"homogeneous enough" and won't trigger this check -- it's here
just in case.


I attempt to capture this in api.txt, leaving the door open in case the
architecture gives a way to support this in future:

  | KVM_REG_ARM64_SVE_VLS
  | 
  | [...]
  | 
  | Apart from simply removing all vector lengths from the host set
  | that exceed some value, support for arbitrarily chosen sets of
  | vector lengths is hardware-dependent and may not be available.
  | Attempting to configure an invalid set of vector lengths via
  | KVM_SET_ONE_REG will fail with EINVAL.

Is this enough, or do you think more explanation is needed somewhere?

[...]

Chees
---DavE
Dave Martin April 5, 2019, 9:36 a.m. UTC | #4
On Thu, Apr 04, 2019 at 10:31:09PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > allow userspace to set and query the set of vector lengths visible
> > to the guest.
> > 
> > In the future, multiple register slices per SVE register may be
> > visible through the ioctl interface.  Once the set of slices has
> > been determined we would not be able to allow the vector length set
> > to be changed any more, in order to avoid userspace seeing
> > inconsistent sets of registers.  For this reason, this patch adds
> > support for explicit finalization of the SVE configuration via the
> > KVM_ARM_VCPU_FINALIZE ioctl.
> > 
> > Finalization is the proper place to allocate the SVE register state
> > storage in vcpu->arch.sve_state, so this patch adds that as
> > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > was previously a no-op on arm64.
> > 
> > To simplify the logic for determining what vector lengths can be
> > supported, some code is added to KVM init to work this out, in the
> > kvm_arm_init_arch_resources() hook.
> > 
> > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > making it visible.
> > 
> > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>

[...]

> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c

[...]

> > +/*
> > + * Finalize vcpu's maximum SVE vector length, allocating
> > + * vcpu->arch.sve_state as necessary.
> > + */
> > +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> > +{
> > +	void *buf;
> > +	unsigned int vl;
> > +
> > +	vl = vcpu->arch.sve_max_vl;
> > +
> > +	/*
> > +	 * Resposibility for these properties is shared between
> > +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> > +	 * set_sve_vls().  Double-check here just to be sure:
> > +	 */
> > +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> > +		    vl > SVE_VL_ARCH_MAX))
> > +		return -EIO;
> > +
> > +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	vcpu->arch.sve_state = buf;
> > +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> > +	return 0;
> > +}
> > +
> > +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> > +{
> > +	switch (what) {
> > +	case KVM_ARM_VCPU_SVE:
> > +		if (!vcpu_has_sve(vcpu))
> > +			return -EINVAL;
> > +
> > +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> > +			return -EPERM;
> > +
> > +		return kvm_vcpu_finalize_sve(vcpu);
> 
> In the next patch I see that we already have KVM_ARM64_GUEST_HAS_SVE
> set in vcpu->arch.flags at this point. So if this kvm_vcpu_finalize_sve()
> call fails, shouldn't we unset KVM_ARM64_GUEST_HAS_SVE and anything
> else used to indicate the vcpu has sve?

If this fails, either userspace did the wrong thing, or there was some
fatal error (like the ENOMEM case).  Either way, the vcpu is not runnable
and userspace is expected to bail out.

Further, KVM_ARM_VCPU_INIT fixes the set of features requested by
userspace: we shouldn't change the set of features under userspace's
feet and try to limp on somehow.  We have no means for userspace to find
out that some features went away in the meantime...

So, what would you be trying to achieve with this change?

[...]

Cheers
---Dave
Andrew Jones April 5, 2019, 10:14 a.m. UTC | #5
On Fri, Apr 05, 2019 at 10:36:03AM +0100, Dave Martin wrote:
> On Thu, Apr 04, 2019 at 10:18:54PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > allow userspace to set and query the set of vector lengths visible
> > > to the guest.
> > > 
> > > In the future, multiple register slices per SVE register may be
> > > visible through the ioctl interface.  Once the set of slices has
> > > been determined we would not be able to allow the vector length set
> > > to be changed any more, in order to avoid userspace seeing
> > > inconsistent sets of registers.  For this reason, this patch adds
> > > support for explicit finalization of the SVE configuration via the
> > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > 
> > > Finalization is the proper place to allocate the SVE register state
> > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > was previously a no-op on arm64.
> > > 
> > > To simplify the logic for determining what vector lengths can be
> > > supported, some code is added to KVM init to work this out, in the
> > > kvm_arm_init_arch_resources() hook.
> > > 
> > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > making it visible.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > > 
> > > ---
> 
> [...]
> 
> > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > > index 2aa80a5..086ab05 100644
> > > --- a/arch/arm64/kvm/guest.c
> > > +++ b/arch/arm64/kvm/guest.c
> > > @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > >  	return err;
> > >  }
> > >  
> > > +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> > > +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> > > +
> > > +static bool vq_present(
> > > +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> > > +	unsigned int vq)
> > > +{
> > > +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> > > +}
> > > +
> > > +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > +{
> > > +	unsigned int max_vq, vq;
> > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > > +
> > > +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> > > +		return -EINVAL;
> > > +
> > > +	memset(vqs, 0, sizeof(vqs));
> > > +
> > > +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > +		if (sve_vq_available(vq))
> > > +			vqs[vq_word(vq)] |= vq_mask(vq);
> > > +
> > > +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> > > +		return -EFAULT;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > +{
> > > +	unsigned int max_vq, vq;
> > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > 
> > There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'.
> > A macro is probably warranted.
> 
> Fair point.  These are a bit cumbersome.  How about:
> 
> #define SVE_VLS_WORDS DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)
> 
> Annoyingly, this is virtually identical to a Linux bitmap: the base type
> is u64 instead of unsigned long; otherwise there's no intentional
> difference.

Yeah, I noticed it was a reinvented bitmap, but I liked that it was,
because, for the userspace api, removing the bitmap abstractions
ensures we know what is what and where exactly it is.

> 
> (Aside: do you know why the KVM API insists on everything being u64?
> This makes sense for non-native types (like guest registers), but not
> for native things like host userspace addresses etc.)

Would that work when the kernel is 64-bit and the userspace is 32? I
personally like the interfaces being explicit with type size, as it
removes the brain burden of translating long and int when moving arch
to arch and kernel to userspace.

> 
> > > +
> > > +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > +		return -EPERM; /* too late! */
> > > +
> > > +	if (WARN_ON(vcpu->arch.sve_state))
> > > +		return -EINVAL;
> > > +
> > > +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> > > +		return -EFAULT;
> > > +
> > > +	max_vq = 0;
> > > +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> > > +		if (vq_present(&vqs, vq))
> > > +			max_vq = vq;
> > > +
> > > +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> > > +		return -EINVAL;
> > > +
> > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> > > +			return -EINVAL;
> > 
> > What about supporting the optional non-power-of-2 vector lengths? For
> > example, if the maximum VL is 512, then in addition to 512, 128, and
> > 256 there could be a 384. If we plan to start a guest on a machine
> > that has all four and then migrate it to a machine that only has
> > 128,256,512 we would want to set the VL set to 128,256,512, even on
> > the machine that supports 384. If I understand the above test correctly,
> > then that wouldn't work.
> 
> This is exactly what the code is trying to forbid, though it may not be
> obvious here why.
> 
> The trouble is, we can't correctly emulate a vcpu supporting
> {128,256,512} on hardware that also supports 384-bit vectors: the
> architecture says that the vector length you get is determined by
> rounding ZCR_EL1.LEN down to the next vector length actually
> supported.
> 
> So, the guest would expect writing ZCR_EL1.LEN = 2 to give a vector
> length of 256 bits, whereas on this hardware the actual resulting
> vector length would be 384 bits.

Oh, I see.

> 
> sve_probe_vqs() relies on this property to detect the set of available
> vector lengths.  Simple, bare-metal guests might also just set
> ZCR_ELx.LEN = 0x1ff to just get the max available.
> 
> 
> The architecture says categorically that the set of vector lengths
> supported is a fixed property of the (emulated) hardware -- we can't
> having this changing under the guest's feet.
> 
> Fixing this would require an archietctural way to filter out individual
> vector lengths from the supported set, not just a way to clamp the
> maximum (which is all ZCR_EL2.LEN gives us).
> 
> The general expectation is that sanely designed cluster will be
> "homogeneous enough" and won't trigger this check -- it's here
> just in case.

Data centers evolve as they expand and hardware is replaced. I wouldn't
expect them to stay homogeneous for long, not without keeping them that
way at substantial cost. This isn't the first thing that is forcing Arm
virt data centers to be homogeneous (we still use qemu's '-cpu host'),
but it's disappointing to have yet another one.

> 
> 
> I attempt to capture this in api.txt, leaving the door open in case the
> architecture gives a way to support this in future:
> 
>   | KVM_REG_ARM64_SVE_VLS
>   | 
>   | [...]
>   | 
>   | Apart from simply removing all vector lengths from the host set
>   | that exceed some value, support for arbitrarily chosen sets of
>   | vector lengths is hardware-dependent and may not be available.
>   | Attempting to configure an invalid set of vector lengths via
>   | KVM_SET_ONE_REG will fail with EINVAL.
> 
> Is this enough, or do you think more explanation is needed somewhere?

I saw that before, so I guess more is needed :) What you wrote above about
how we can only shorten the list due to how zcr_el1.len works was the
missing piece for me. Maybe that should be integrated into the text
somehow.

Thanks,
drew
Andrew Jones April 5, 2019, 10:22 a.m. UTC | #6
On Fri, Apr 05, 2019 at 10:36:10AM +0100, Dave Martin wrote:
> On Thu, Apr 04, 2019 at 10:31:09PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > allow userspace to set and query the set of vector lengths visible
> > > to the guest.
> > > 
> > > In the future, multiple register slices per SVE register may be
> > > visible through the ioctl interface.  Once the set of slices has
> > > been determined we would not be able to allow the vector length set
> > > to be changed any more, in order to avoid userspace seeing
> > > inconsistent sets of registers.  For this reason, this patch adds
> > > support for explicit finalization of the SVE configuration via the
> > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > 
> > > Finalization is the proper place to allocate the SVE register state
> > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > was previously a no-op on arm64.
> > > 
> > > To simplify the logic for determining what vector lengths can be
> > > supported, some code is added to KVM init to work this out, in the
> > > kvm_arm_init_arch_resources() hook.
> > > 
> > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > making it visible.
> > > 
> > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> 
> [...]
> 
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> 
> [...]
> 
> > > +/*
> > > + * Finalize vcpu's maximum SVE vector length, allocating
> > > + * vcpu->arch.sve_state as necessary.
> > > + */
> > > +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> > > +{
> > > +	void *buf;
> > > +	unsigned int vl;
> > > +
> > > +	vl = vcpu->arch.sve_max_vl;
> > > +
> > > +	/*
> > > +	 * Resposibility for these properties is shared between
> > > +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> > > +	 * set_sve_vls().  Double-check here just to be sure:
> > > +	 */
> > > +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> > > +		    vl > SVE_VL_ARCH_MAX))
> > > +		return -EIO;
> > > +
> > > +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> > > +	if (!buf)
> > > +		return -ENOMEM;
> > > +
> > > +	vcpu->arch.sve_state = buf;
> > > +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> > > +	return 0;
> > > +}
> > > +
> > > +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> > > +{
> > > +	switch (what) {
> > > +	case KVM_ARM_VCPU_SVE:
> > > +		if (!vcpu_has_sve(vcpu))
> > > +			return -EINVAL;
> > > +
> > > +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > +			return -EPERM;
> > > +
> > > +		return kvm_vcpu_finalize_sve(vcpu);
> > 
> > In the next patch I see that we already have KVM_ARM64_GUEST_HAS_SVE
> > set in vcpu->arch.flags at this point. So if this kvm_vcpu_finalize_sve()
> > call fails, shouldn't we unset KVM_ARM64_GUEST_HAS_SVE and anything
> > else used to indicate the vcpu has sve?
> 
> If this fails, either userspace did the wrong thing, or there was some
> fatal error (like the ENOMEM case).  Either way, the vcpu is not runnable
> and userspace is expected to bail out.
> 
> Further, KVM_ARM_VCPU_INIT fixes the set of features requested by
> userspace: we shouldn't change the set of features under userspace's
> feet and try to limp on somehow.  We have no means for userspace to find
> out that some features went away in the meantime...
> 
> So, what would you be trying to achieve with this change?

Being able to do additional vcpu ioctls with only partially initialized
sve (sve_state is still null, but we otherwise believe the vcpu has sve).
That sounds risky. Although I see we only set KVM_ARM64_VCPU_SVE_FINALIZED
when everything, like the memory allocation, succeeds, so we're probably
fine. Indeed having KVM_ARM64_GUEST_HAS_SVE and not KVM_ARM64_VCPU_SVE_FINALIZED
is likely the safest vcpu state for a vcpu that failed to finalize sve.

Thanks,
drew
Dave Martin April 5, 2019, 12:54 p.m. UTC | #7
On Fri, Apr 05, 2019 at 12:14:51PM +0200, Andrew Jones wrote:
> On Fri, Apr 05, 2019 at 10:36:03AM +0100, Dave Martin wrote:
> > On Thu, Apr 04, 2019 at 10:18:54PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > > allow userspace to set and query the set of vector lengths visible
> > > > to the guest.
> > > > 
> > > > In the future, multiple register slices per SVE register may be
> > > > visible through the ioctl interface.  Once the set of slices has
> > > > been determined we would not be able to allow the vector length set
> > > > to be changed any more, in order to avoid userspace seeing
> > > > inconsistent sets of registers.  For this reason, this patch adds
> > > > support for explicit finalization of the SVE configuration via the
> > > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > > 
> > > > Finalization is the proper place to allocate the SVE register state
> > > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > > was previously a no-op on arm64.
> > > > 
> > > > To simplify the logic for determining what vector lengths can be
> > > > supported, some code is added to KVM init to work this out, in the
> > > > kvm_arm_init_arch_resources() hook.
> > > > 
> > > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > > making it visible.
> > > > 
> > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > > > 
> > > > ---
> > 
> > [...]
> > 
> > > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > > > index 2aa80a5..086ab05 100644
> > > > --- a/arch/arm64/kvm/guest.c
> > > > +++ b/arch/arm64/kvm/guest.c
> > > > @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > >  	return err;
> > > >  }
> > > >  
> > > > +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> > > > +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> > > > +
> > > > +static bool vq_present(
> > > > +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> > > > +	unsigned int vq)
> > > > +{
> > > > +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> > > > +}
> > > > +
> > > > +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > +{
> > > > +	unsigned int max_vq, vq;
> > > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > > > +
> > > > +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> > > > +		return -EINVAL;
> > > > +
> > > > +	memset(vqs, 0, sizeof(vqs));
> > > > +
> > > > +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> > > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > > +		if (sve_vq_available(vq))
> > > > +			vqs[vq_word(vq)] |= vq_mask(vq);
> > > > +
> > > > +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> > > > +		return -EFAULT;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > +{
> > > > +	unsigned int max_vq, vq;
> > > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > > 
> > > There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'.
> > > A macro is probably warranted.
> > 
> > Fair point.  These are a bit cumbersome.  How about:
> > 
> > #define SVE_VLS_WORDS DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)
> > 
> > Annoyingly, this is virtually identical to a Linux bitmap: the base type
> > is u64 instead of unsigned long; otherwise there's no intentional
> > difference.
> 
> Yeah, I noticed it was a reinvented bitmap, but I liked that it was,
> because, for the userspace api, removing the bitmap abstractions
> ensures we know what is what and where exactly it is.

Agreed.  I thought of using the bitmap API inside the kernel, but even
if this is arm64-specific code, it made me uneasy: the compiler doesn't
necessarily think that u64 and unsigned long are the same type even if
they're both 64-bit, so there's the potential for weird optimisations to
happen.

> > (Aside: do you know why the KVM API insists on everything being u64?
> > This makes sense for non-native types (like guest registers), but not
> > for native things like host userspace addresses etc.)
> 
> Would that work when the kernel is 64-bit and the userspace is 32? I
> personally like the interfaces being explicit with type size, as it
> removes the brain burden of translating long and int when moving arch
> to arch and kernel to userspace.

Ah yes, the joys of compat.  And ioctl.  Ignore me.

> > > > +
> > > > +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > > +		return -EPERM; /* too late! */
> > > > +
> > > > +	if (WARN_ON(vcpu->arch.sve_state))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> > > > +		return -EFAULT;
> > > > +
> > > > +	max_vq = 0;
> > > > +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> > > > +		if (vq_present(&vqs, vq))
> > > > +			max_vq = vq;
> > > > +
> > > > +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> > > > +		return -EINVAL;
> > > > +
> > > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > > +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> > > > +			return -EINVAL;
> > > 
> > > What about supporting the optional non-power-of-2 vector lengths? For
> > > example, if the maximum VL is 512, then in addition to 512, 128, and
> > > 256 there could be a 384. If we plan to start a guest on a machine
> > > that has all four and then migrate it to a machine that only has
> > > 128,256,512 we would want to set the VL set to 128,256,512, even on
> > > the machine that supports 384. If I understand the above test correctly,
> > > then that wouldn't work.
> > 
> > This is exactly what the code is trying to forbid, though it may not be
> > obvious here why.
> > 
> > The trouble is, we can't correctly emulate a vcpu supporting
> > {128,256,512} on hardware that also supports 384-bit vectors: the
> > architecture says that the vector length you get is determined by
> > rounding ZCR_EL1.LEN down to the next vector length actually
> > supported.
> > 
> > So, the guest would expect writing ZCR_EL1.LEN = 2 to give a vector
> > length of 256 bits, whereas on this hardware the actual resulting
> > vector length would be 384 bits.
> 
> Oh, I see.
> 
> > 
> > sve_probe_vqs() relies on this property to detect the set of available
> > vector lengths.  Simple, bare-metal guests might also just set
> > ZCR_ELx.LEN = 0x1ff to just get the max available.
> > 
> > 
> > The architecture says categorically that the set of vector lengths
> > supported is a fixed property of the (emulated) hardware -- we can't
> > having this changing under the guest's feet.
> > 
> > Fixing this would require an archietctural way to filter out individual
> > vector lengths from the supported set, not just a way to clamp the
> > maximum (which is all ZCR_EL2.LEN gives us).
> > 
> > The general expectation is that sanely designed cluster will be
> > "homogeneous enough" and won't trigger this check -- it's here
> > just in case.
> 
> Data centers evolve as they expand and hardware is replaced. I wouldn't
> expect them to stay homogeneous for long, not without keeping them that
> way at substantial cost. This isn't the first thing that is forcing Arm
> virt data centers to be homogeneous (we still use qemu's '-cpu host'),
> but it's disappointing to have yet another one.

This may or may not be a problem in practive, since I suspect
implementations that support non-power-of-two vector lengths may be in
the minority anyway.

The ARM Fast Model and probably QEMU can do it, but people are less
likely to try to build a high-performance cluster out of those...

> > I attempt to capture this in api.txt, leaving the door open in case the
> > architecture gives a way to support this in future:
> > 
> >   | KVM_REG_ARM64_SVE_VLS
> >   | 
> >   | [...]
> >   | 
> >   | Apart from simply removing all vector lengths from the host set
> >   | that exceed some value, support for arbitrarily chosen sets of
> >   | vector lengths is hardware-dependent and may not be available.
> >   | Attempting to configure an invalid set of vector lengths via
> >   | KVM_SET_ONE_REG will fail with EINVAL.
> > 
> > Is this enough, or do you think more explanation is needed somewhere?
> 
> I saw that before, so I guess more is needed :) What you wrote above about
> how we can only shorten the list due to how zcr_el1.len works was the
> missing piece for me. Maybe that should be integrated into the text
> somehow.

If you think the above is enough for ABI documentation purposes, I will
aim to drop the following comment into set_sve_vls():

	/*
	 * Vector lengths supported by the host can't currently be
	 * hidden from the guest individually: instead we can only set a
	 * maxmium via ZCR_EL2.LEN.  So, make sure the available vector
	 * length match the set requested exactly up to the requested
	 * maximum:
	 */
	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
		if (vq_present(&vqs, vq) != sve_vq_available(vq))
			return -EINVAL;

Do you think that's enough?

Cheers
---Dave
Dave Martin April 5, 2019, 2:06 p.m. UTC | #8
On Fri, Apr 05, 2019 at 12:22:04PM +0200, Andrew Jones wrote:
> On Fri, Apr 05, 2019 at 10:36:10AM +0100, Dave Martin wrote:
> > On Thu, Apr 04, 2019 at 10:31:09PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > > allow userspace to set and query the set of vector lengths visible
> > > > to the guest.
> > > > 
> > > > In the future, multiple register slices per SVE register may be
> > > > visible through the ioctl interface.  Once the set of slices has
> > > > been determined we would not be able to allow the vector length set
> > > > to be changed any more, in order to avoid userspace seeing
> > > > inconsistent sets of registers.  For this reason, this patch adds
> > > > support for explicit finalization of the SVE configuration via the
> > > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > > 
> > > > Finalization is the proper place to allocate the SVE register state
> > > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > > was previously a no-op on arm64.
> > > > 
> > > > To simplify the logic for determining what vector lengths can be
> > > > supported, some code is added to KVM init to work this out, in the
> > > > kvm_arm_init_arch_resources() hook.
> > > > 
> > > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > > making it visible.
> > > > 
> > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > 
> > [...]
> > 
> > > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > 
> > [...]
> > 
> > > > +/*
> > > > + * Finalize vcpu's maximum SVE vector length, allocating
> > > > + * vcpu->arch.sve_state as necessary.
> > > > + */
> > > > +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	void *buf;
> > > > +	unsigned int vl;
> > > > +
> > > > +	vl = vcpu->arch.sve_max_vl;
> > > > +
> > > > +	/*
> > > > +	 * Resposibility for these properties is shared between
> > > > +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> > > > +	 * set_sve_vls().  Double-check here just to be sure:
> > > > +	 */
> > > > +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> > > > +		    vl > SVE_VL_ARCH_MAX))
> > > > +		return -EIO;
> > > > +
> > > > +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> > > > +	if (!buf)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	vcpu->arch.sve_state = buf;
> > > > +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> > > > +{
> > > > +	switch (what) {
> > > > +	case KVM_ARM_VCPU_SVE:
> > > > +		if (!vcpu_has_sve(vcpu))
> > > > +			return -EINVAL;
> > > > +
> > > > +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > > +			return -EPERM;
> > > > +
> > > > +		return kvm_vcpu_finalize_sve(vcpu);
> > > 
> > > In the next patch I see that we already have KVM_ARM64_GUEST_HAS_SVE
> > > set in vcpu->arch.flags at this point. So if this kvm_vcpu_finalize_sve()
> > > call fails, shouldn't we unset KVM_ARM64_GUEST_HAS_SVE and anything
> > > else used to indicate the vcpu has sve?
> > 
> > If this fails, either userspace did the wrong thing, or there was some
> > fatal error (like the ENOMEM case).  Either way, the vcpu is not runnable
> > and userspace is expected to bail out.
> > 
> > Further, KVM_ARM_VCPU_INIT fixes the set of features requested by
> > userspace: we shouldn't change the set of features under userspace's
> > feet and try to limp on somehow.  We have no means for userspace to find
> > out that some features went away in the meantime...
> > 
> > So, what would you be trying to achieve with this change?
> 
> Being able to do additional vcpu ioctls with only partially initialized
> sve (sve_state is still null, but we otherwise believe the vcpu has sve).
> That sounds risky. Although I see we only set KVM_ARM64_VCPU_SVE_FINALIZED
> when everything, like the memory allocation, succeeds, so we're probably
> fine. Indeed having KVM_ARM64_GUEST_HAS_SVE and not KVM_ARM64_VCPU_SVE_FINALIZED
> is likely the safest vcpu state for a vcpu that failed to finalize sve.

This was my rationale: every non-trivial operation that is affected by
SVE operation checks for kvm_arm_vcpu_sve_finalized(): accessing
KVM_REG_ARM64_SVE_VLS should be the only exception.

Of course, people might forget that in new code, but I think all the
major ABI code paths that are likely to exist for SVE are already there
now.

Does this sound OK, or do you still have concerns?

Clearing KVM_ARM64_GUEST_HAS_SVE could be done on finalization failure;
it just feels a bit weird.

Cheers
---Dave
Andrew Jones April 5, 2019, 3:33 p.m. UTC | #9
On Fri, Apr 05, 2019 at 01:54:13PM +0100, Dave Martin wrote:
> On Fri, Apr 05, 2019 at 12:14:51PM +0200, Andrew Jones wrote:
> > On Fri, Apr 05, 2019 at 10:36:03AM +0100, Dave Martin wrote:
> > > On Thu, Apr 04, 2019 at 10:18:54PM +0200, Andrew Jones wrote:
> > > > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > > > allow userspace to set and query the set of vector lengths visible
> > > > > to the guest.
> > > > > 
> > > > > In the future, multiple register slices per SVE register may be
> > > > > visible through the ioctl interface.  Once the set of slices has
> > > > > been determined we would not be able to allow the vector length set
> > > > > to be changed any more, in order to avoid userspace seeing
> > > > > inconsistent sets of registers.  For this reason, this patch adds
> > > > > support for explicit finalization of the SVE configuration via the
> > > > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > > > 
> > > > > Finalization is the proper place to allocate the SVE register state
> > > > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > > > was previously a no-op on arm64.
> > > > > 
> > > > > To simplify the logic for determining what vector lengths can be
> > > > > supported, some code is added to KVM init to work this out, in the
> > > > > kvm_arm_init_arch_resources() hook.
> > > > > 
> > > > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > > > making it visible.
> > > > > 
> > > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > > > > 
> > > > > ---
> > > 
> > > [...]
> > > 
> > > > > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > > > > index 2aa80a5..086ab05 100644
> > > > > --- a/arch/arm64/kvm/guest.c
> > > > > +++ b/arch/arm64/kvm/guest.c
> > > > > @@ -206,6 +206,73 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > >  	return err;
> > > > >  }
> > > > >  
> > > > > +#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
> > > > > +#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
> > > > > +
> > > > > +static bool vq_present(
> > > > > +	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
> > > > > +	unsigned int vq)
> > > > > +{
> > > > > +	return (*vqs)[vq_word(vq)] & vq_mask(vq);
> > > > > +}
> > > > > +
> > > > > +static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > > +{
> > > > > +	unsigned int max_vq, vq;
> > > > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > > > > +
> > > > > +	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	memset(vqs, 0, sizeof(vqs));
> > > > > +
> > > > > +	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
> > > > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > > > +		if (sve_vq_available(vq))
> > > > > +			vqs[vq_word(vq)] |= vq_mask(vq);
> > > > > +
> > > > > +	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
> > > > > +		return -EFAULT;
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> > > > > +{
> > > > > +	unsigned int max_vq, vq;
> > > > > +	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
> > > > 
> > > > There are three of these long 'DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)'.
> > > > A macro is probably warranted.
> > > 
> > > Fair point.  These are a bit cumbersome.  How about:
> > > 
> > > #define SVE_VLS_WORDS DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)
> > > 
> > > Annoyingly, this is virtually identical to a Linux bitmap: the base type
> > > is u64 instead of unsigned long; otherwise there's no intentional
> > > difference.
> > 
> > Yeah, I noticed it was a reinvented bitmap, but I liked that it was,
> > because, for the userspace api, removing the bitmap abstractions
> > ensures we know what is what and where exactly it is.
> 
> Agreed.  I thought of using the bitmap API inside the kernel, but even
> if this is arm64-specific code, it made me uneasy: the compiler doesn't
> necessarily think that u64 and unsigned long are the same type even if
> they're both 64-bit, so there's the potential for weird optimisations to
> happen.
> 
> > > (Aside: do you know why the KVM API insists on everything being u64?
> > > This makes sense for non-native types (like guest registers), but not
> > > for native things like host userspace addresses etc.)
> > 
> > Would that work when the kernel is 64-bit and the userspace is 32? I
> > personally like the interfaces being explicit with type size, as it
> > removes the brain burden of translating long and int when moving arch
> > to arch and kernel to userspace.
> 
> Ah yes, the joys of compat.  And ioctl.  Ignore me.
> 
> > > > > +
> > > > > +	if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > > > +		return -EPERM; /* too late! */
> > > > > +
> > > > > +	if (WARN_ON(vcpu->arch.sve_state))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
> > > > > +		return -EFAULT;
> > > > > +
> > > > > +	max_vq = 0;
> > > > > +	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
> > > > > +		if (vq_present(&vqs, vq))
> > > > > +			max_vq = vq;
> > > > > +
> > > > > +	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > > > > +		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> > > > > +			return -EINVAL;
> > > > 
> > > > What about supporting the optional non-power-of-2 vector lengths? For
> > > > example, if the maximum VL is 512, then in addition to 512, 128, and
> > > > 256 there could be a 384. If we plan to start a guest on a machine
> > > > that has all four and then migrate it to a machine that only has
> > > > 128,256,512 we would want to set the VL set to 128,256,512, even on
> > > > the machine that supports 384. If I understand the above test correctly,
> > > > then that wouldn't work.
> > > 
> > > This is exactly what the code is trying to forbid, though it may not be
> > > obvious here why.
> > > 
> > > The trouble is, we can't correctly emulate a vcpu supporting
> > > {128,256,512} on hardware that also supports 384-bit vectors: the
> > > architecture says that the vector length you get is determined by
> > > rounding ZCR_EL1.LEN down to the next vector length actually
> > > supported.
> > > 
> > > So, the guest would expect writing ZCR_EL1.LEN = 2 to give a vector
> > > length of 256 bits, whereas on this hardware the actual resulting
> > > vector length would be 384 bits.
> > 
> > Oh, I see.
> > 
> > > 
> > > sve_probe_vqs() relies on this property to detect the set of available
> > > vector lengths.  Simple, bare-metal guests might also just set
> > > ZCR_ELx.LEN = 0x1ff to just get the max available.
> > > 
> > > 
> > > The architecture says categorically that the set of vector lengths
> > > supported is a fixed property of the (emulated) hardware -- we can't
> > > having this changing under the guest's feet.
> > > 
> > > Fixing this would require an archietctural way to filter out individual
> > > vector lengths from the supported set, not just a way to clamp the
> > > maximum (which is all ZCR_EL2.LEN gives us).
> > > 
> > > The general expectation is that sanely designed cluster will be
> > > "homogeneous enough" and won't trigger this check -- it's here
> > > just in case.
> > 
> > Data centers evolve as they expand and hardware is replaced. I wouldn't
> > expect them to stay homogeneous for long, not without keeping them that
> > way at substantial cost. This isn't the first thing that is forcing Arm
> > virt data centers to be homogeneous (we still use qemu's '-cpu host'),
> > but it's disappointing to have yet another one.
> 
> This may or may not be a problem in practive, since I suspect
> implementations that support non-power-of-two vector lengths may be in
> the minority anyway.
> 
> The ARM Fast Model and probably QEMU can do it, but people are less
> likely to try to build a high-performance cluster out of those...
> 
> > > I attempt to capture this in api.txt, leaving the door open in case the
> > > architecture gives a way to support this in future:
> > > 
> > >   | KVM_REG_ARM64_SVE_VLS
> > >   | 
> > >   | [...]
> > >   | 
> > >   | Apart from simply removing all vector lengths from the host set
> > >   | that exceed some value, support for arbitrarily chosen sets of
> > >   | vector lengths is hardware-dependent and may not be available.
> > >   | Attempting to configure an invalid set of vector lengths via
> > >   | KVM_SET_ONE_REG will fail with EINVAL.
> > > 
> > > Is this enough, or do you think more explanation is needed somewhere?
> > 
> > I saw that before, so I guess more is needed :) What you wrote above about
> > how we can only shorten the list due to how zcr_el1.len works was the
> > missing piece for me. Maybe that should be integrated into the text
> > somehow.
> 
> If you think the above is enough for ABI documentation purposes, I will
> aim to drop the following comment into set_sve_vls():
> 
> 	/*
> 	 * Vector lengths supported by the host can't currently be
> 	 * hidden from the guest individually: instead we can only set a
> 	 * maxmium via ZCR_EL2.LEN.  So, make sure the available vector
> 	 * length match the set requested exactly up to the requested
> 	 * maximum:
> 	 */
> 	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> 		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> 			return -EINVAL;
> 
> Do you think that's enough?
>

That works for me, and I'm glad I now have it understood, because this
will be a key piece of the QEMU uesr interface to work out. I.e. we
need to be able to provide the user with the current host's VL list
up to the max virtualizable VL when queried, allow them to choose a max
from that list for a guest, and then commit to that VL sublist for the
guest. I'll dive into QEMU on Monday, hopefully I won't drown there.

drew
Andrew Jones April 5, 2019, 3:41 p.m. UTC | #10
On Fri, Apr 05, 2019 at 03:06:40PM +0100, Dave Martin wrote:
> On Fri, Apr 05, 2019 at 12:22:04PM +0200, Andrew Jones wrote:
> > On Fri, Apr 05, 2019 at 10:36:10AM +0100, Dave Martin wrote:
> > > On Thu, Apr 04, 2019 at 10:31:09PM +0200, Andrew Jones wrote:
> > > > On Fri, Mar 29, 2019 at 01:00:48PM +0000, Dave Martin wrote:
> > > > > This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> > > > > allow userspace to set and query the set of vector lengths visible
> > > > > to the guest.
> > > > > 
> > > > > In the future, multiple register slices per SVE register may be
> > > > > visible through the ioctl interface.  Once the set of slices has
> > > > > been determined we would not be able to allow the vector length set
> > > > > to be changed any more, in order to avoid userspace seeing
> > > > > inconsistent sets of registers.  For this reason, this patch adds
> > > > > support for explicit finalization of the SVE configuration via the
> > > > > KVM_ARM_VCPU_FINALIZE ioctl.
> > > > > 
> > > > > Finalization is the proper place to allocate the SVE register state
> > > > > storage in vcpu->arch.sve_state, so this patch adds that as
> > > > > appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> > > > > was previously a no-op on arm64.
> > > > > 
> > > > > To simplify the logic for determining what vector lengths can be
> > > > > supported, some code is added to KVM init to work this out, in the
> > > > > kvm_arm_init_arch_resources() hook.
> > > > > 
> > > > > The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> > > > > Subsequent patches will allow SVE to be turned on for guest vcpus,
> > > > > making it visible.
> > > > > 
> > > > > Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> > > > > Reviewed-by: Julien Thierry <julien.thierry@arm.com>
> > > > > Tested-by: zhang.lei <zhang.lei@jp.fujitsu.com>
> > > 
> > > [...]
> > > 
> > > > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > 
> > > [...]
> > > 
> > > > > +/*
> > > > > + * Finalize vcpu's maximum SVE vector length, allocating
> > > > > + * vcpu->arch.sve_state as necessary.
> > > > > + */
> > > > > +static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +	void *buf;
> > > > > +	unsigned int vl;
> > > > > +
> > > > > +	vl = vcpu->arch.sve_max_vl;
> > > > > +
> > > > > +	/*
> > > > > +	 * Resposibility for these properties is shared between
> > > > > +	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
> > > > > +	 * set_sve_vls().  Double-check here just to be sure:
> > > > > +	 */
> > > > > +	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
> > > > > +		    vl > SVE_VL_ARCH_MAX))
> > > > > +		return -EIO;
> > > > > +
> > > > > +	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
> > > > > +	if (!buf)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	vcpu->arch.sve_state = buf;
> > > > > +	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
> > > > > +{
> > > > > +	switch (what) {
> > > > > +	case KVM_ARM_VCPU_SVE:
> > > > > +		if (!vcpu_has_sve(vcpu))
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		if (kvm_arm_vcpu_sve_finalized(vcpu))
> > > > > +			return -EPERM;
> > > > > +
> > > > > +		return kvm_vcpu_finalize_sve(vcpu);
> > > > 
> > > > In the next patch I see that we already have KVM_ARM64_GUEST_HAS_SVE
> > > > set in vcpu->arch.flags at this point. So if this kvm_vcpu_finalize_sve()
> > > > call fails, shouldn't we unset KVM_ARM64_GUEST_HAS_SVE and anything
> > > > else used to indicate the vcpu has sve?
> > > 
> > > If this fails, either userspace did the wrong thing, or there was some
> > > fatal error (like the ENOMEM case).  Either way, the vcpu is not runnable
> > > and userspace is expected to bail out.
> > > 
> > > Further, KVM_ARM_VCPU_INIT fixes the set of features requested by
> > > userspace: we shouldn't change the set of features under userspace's
> > > feet and try to limp on somehow.  We have no means for userspace to find
> > > out that some features went away in the meantime...
> > > 
> > > So, what would you be trying to achieve with this change?
> > 
> > Being able to do additional vcpu ioctls with only partially initialized
> > sve (sve_state is still null, but we otherwise believe the vcpu has sve).
> > That sounds risky. Although I see we only set KVM_ARM64_VCPU_SVE_FINALIZED
> > when everything, like the memory allocation, succeeds, so we're probably
> > fine. Indeed having KVM_ARM64_GUEST_HAS_SVE and not KVM_ARM64_VCPU_SVE_FINALIZED
> > is likely the safest vcpu state for a vcpu that failed to finalize sve.
> 
> This was my rationale: every non-trivial operation that is affected by
> SVE operation checks for kvm_arm_vcpu_sve_finalized(): accessing
> KVM_REG_ARM64_SVE_VLS should be the only exception.
> 
> Of course, people might forget that in new code, but I think all the
> major ABI code paths that are likely to exist for SVE are already there
> now.
> 
> Does this sound OK, or do you still have concerns?
> 
> Clearing KVM_ARM64_GUEST_HAS_SVE could be done on finalization failure;
> it just feels a bit weird.

Let's leave it. It's safe now, and there's always risk new code could
screw something up, no matter which way we go now, so let's pick the
simpler way.

Thanks,
drew
Dave Martin April 10, 2019, 12:35 p.m. UTC | #11
On Fri, Apr 05, 2019 at 05:41:31PM +0200, Andrew Jones wrote:
> On Fri, Apr 05, 2019 at 03:06:40PM +0100, Dave Martin wrote:

[...]

> > Clearing KVM_ARM64_GUEST_HAS_SVE could be done on finalization failure;
> > it just feels a bit weird.
> 
> Let's leave it. It's safe now, and there's always risk new code could
> screw something up, no matter which way we go now, so let's pick the
> simpler way.

OK, I can't argue with that.

Cheers
---Dave
Dave Martin April 10, 2019, 12:42 p.m. UTC | #12
On Fri, Apr 05, 2019 at 05:33:21PM +0200, Andrew Jones wrote:
> On Fri, Apr 05, 2019 at 01:54:13PM +0100, Dave Martin wrote:

[...]

> > If you think the above is enough for ABI documentation purposes, I will
> > aim to drop the following comment into set_sve_vls():
> > 
> > 	/*
> > 	 * Vector lengths supported by the host can't currently be
> > 	 * hidden from the guest individually: instead we can only set a
> > 	 * maxmium via ZCR_EL2.LEN.  So, make sure the available vector
> > 	 * length match the set requested exactly up to the requested
> > 	 * maximum:
> > 	 */
> > 	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
> > 		if (vq_present(&vqs, vq) != sve_vq_available(vq))
> > 			return -EINVAL;
> > 
> > Do you think that's enough?
> >
> 
> That works for me, and I'm glad I now have it understood, because this
> will be a key piece of the QEMU uesr interface to work out. I.e. we
> need to be able to provide the user with the current host's VL list
> up to the max virtualizable VL when queried, allow them to choose a max
> from that list for a guest, and then commit to that VL sublist for the
> guest. I'll dive into QEMU on Monday, hopefully I won't drown there.

Agreed.

How to make this user-friendly is a separate problem, provided that the
kernel provides userspace with sufficient tools to do the job.

Cheers
---Dave
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 98658f7..5475cc4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -23,7 +23,6 @@ 
 #define __ARM64_KVM_HOST_H__
 
 #include <linux/bitmap.h>
-#include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/jump_label.h>
 #include <linux/kvm_types.h>
@@ -50,6 +49,7 @@ 
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
+/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
 #define KVM_VCPU_MAX_FEATURES 4
 
 #define KVM_REQ_SLEEP \
@@ -59,10 +59,12 @@ 
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
-static inline int kvm_arm_init_arch_resources(void) { return 0; }
+extern unsigned int kvm_sve_max_vl;
+int kvm_arm_init_arch_resources(void);
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
 int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
@@ -353,6 +355,7 @@  struct kvm_vcpu_arch {
 #define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
 #define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
 #define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
+#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
 
 #define vcpu_has_sve(vcpu) (system_supports_sve() && \
 			    ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
@@ -525,7 +528,6 @@  static inline bool kvm_arch_requires_vhe(void)
 
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
-static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
@@ -626,7 +628,10 @@  void kvm_arch_free_vm(struct kvm *kvm);
 
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
 
-#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
-#define kvm_arm_vcpu_is_finalized(vcpu) true
+int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what);
+bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
+
+#define kvm_arm_vcpu_sve_finalized(vcpu) \
+	((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ced760c..6963b7e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -102,6 +102,7 @@  struct kvm_regs {
 #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
 #define KVM_ARM_VCPU_PSCI_0_2		2 /* CPU uses PSCI v0.2 */
 #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -243,6 +244,10 @@  struct kvm_vcpu_events {
 					 ((n) << 5) | (i))
 #define KVM_REG_ARM64_SVE_FFR(i)	KVM_REG_ARM64_SVE_PREG(16, i)
 
+/* Vector lengths pseudo-register: */
+#define KVM_REG_ARM64_SVE_VLS		(KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
+					 KVM_REG_SIZE_U512 | 0xffff)
+
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2aa80a5..086ab05 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -206,6 +206,73 @@  static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return err;
 }
 
+#define vq_word(vq) (((vq) - SVE_VQ_MIN) / 64)
+#define vq_mask(vq) ((u64)1 << ((vq) - SVE_VQ_MIN) % 64)
+
+static bool vq_present(
+	const u64 (*const vqs)[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)],
+	unsigned int vq)
+{
+	return (*vqs)[vq_word(vq)] & vq_mask(vq);
+}
+
+static int get_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	unsigned int max_vq, vq;
+	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
+
+	if (WARN_ON(!sve_vl_valid(vcpu->arch.sve_max_vl)))
+		return -EINVAL;
+
+	memset(vqs, 0, sizeof(vqs));
+
+	max_vq = sve_vq_from_vl(vcpu->arch.sve_max_vl);
+	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
+		if (sve_vq_available(vq))
+			vqs[vq_word(vq)] |= vq_mask(vq);
+
+	if (copy_to_user((void __user *)reg->addr, vqs, sizeof(vqs)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int set_sve_vls(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	unsigned int max_vq, vq;
+	u64 vqs[DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 64)];
+
+	if (kvm_arm_vcpu_sve_finalized(vcpu))
+		return -EPERM; /* too late! */
+
+	if (WARN_ON(vcpu->arch.sve_state))
+		return -EINVAL;
+
+	if (copy_from_user(vqs, (const void __user *)reg->addr, sizeof(vqs)))
+		return -EFAULT;
+
+	max_vq = 0;
+	for (vq = SVE_VQ_MIN; vq <= SVE_VQ_MAX; ++vq)
+		if (vq_present(&vqs, vq))
+			max_vq = vq;
+
+	if (max_vq > sve_vq_from_vl(kvm_sve_max_vl))
+		return -EINVAL;
+
+	for (vq = SVE_VQ_MIN; vq <= max_vq; ++vq)
+		if (vq_present(&vqs, vq) != sve_vq_available(vq))
+			return -EINVAL;
+
+	/* Can't run with no vector lengths at all: */
+	if (max_vq < SVE_VQ_MIN)
+		return -EINVAL;
+
+	/* vcpu->arch.sve_state will be alloc'd by kvm_vcpu_finalize_sve() */
+	vcpu->arch.sve_max_vl = sve_vl_from_vq(max_vq);
+
+	return 0;
+}
+
 #define SVE_REG_SLICE_SHIFT	0
 #define SVE_REG_SLICE_BITS	5
 #define SVE_REG_ID_SHIFT	(SVE_REG_SLICE_SHIFT + SVE_REG_SLICE_BITS)
@@ -296,7 +363,19 @@  static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	struct sve_state_reg_region region;
 	char __user *uptr = (char __user *)reg->addr;
 
-	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
+	if (!vcpu_has_sve(vcpu))
+		return -ENOENT;
+
+	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
+	if (reg->id == KVM_REG_ARM64_SVE_VLS)
+		return get_sve_vls(vcpu, reg);
+
+	/* Otherwise, reg is an architectural SVE register... */
+
+	if (!kvm_arm_vcpu_sve_finalized(vcpu))
+		return -EPERM;
+
+	if (sve_reg_to_region(&region, vcpu, reg))
 		return -ENOENT;
 
 	if (copy_to_user(uptr, vcpu->arch.sve_state + region.koffset,
@@ -312,7 +391,19 @@  static int set_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	struct sve_state_reg_region region;
 	const char __user *uptr = (const char __user *)reg->addr;
 
-	if (!vcpu_has_sve(vcpu) || sve_reg_to_region(&region, vcpu, reg))
+	if (!vcpu_has_sve(vcpu))
+		return -ENOENT;
+
+	/* Handle the KVM_REG_ARM64_SVE_VLS pseudo-reg as a special case: */
+	if (reg->id == KVM_REG_ARM64_SVE_VLS)
+		return set_sve_vls(vcpu, reg);
+
+	/* Otherwise, reg is an architectural SVE register... */
+
+	if (!kvm_arm_vcpu_sve_finalized(vcpu))
+		return -EPERM;
+
+	if (sve_reg_to_region(&region, vcpu, reg))
 		return -ENOENT;
 
 	if (copy_from_user(vcpu->arch.sve_state + region.koffset, uptr,
@@ -426,7 +517,11 @@  static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
 	if (!vcpu_has_sve(vcpu))
 		return 0;
 
-	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */);
+	/* Policed by KVM_GET_REG_LIST: */
+	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
+
+	return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */)
+		+ 1; /* KVM_REG_ARM64_SVE_VLS */
 }
 
 static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
@@ -441,6 +536,19 @@  static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
 	if (!vcpu_has_sve(vcpu))
 		return 0;
 
+	/* Policed by KVM_GET_REG_LIST: */
+	WARN_ON(!kvm_arm_vcpu_sve_finalized(vcpu));
+
+	/*
+	 * Enumerate this first, so that userspace can save/restore in
+	 * the order reported by KVM_GET_REG_LIST:
+	 */
+	reg = KVM_REG_ARM64_SVE_VLS;
+	if (put_user(reg, uindices++))
+		return -EFAULT;
+
+	++num_regs;
+
 	for (i = 0; i < slices; i++) {
 		for (n = 0; n < SVE_NUM_ZREGS; n++) {
 			reg = KVM_REG_ARM64_SVE_ZREG(n, i);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f16a5f8..e7f9c06 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -23,11 +23,14 @@ 
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/slab.h>
+#include <linux/types.h>
 
 #include <kvm/arm_arch_timer.h>
 
 #include <asm/cpufeature.h>
 #include <asm/cputype.h>
+#include <asm/fpsimd.h>
 #include <asm/ptrace.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
@@ -99,6 +102,92 @@  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	return r;
 }
 
+unsigned int kvm_sve_max_vl;
+
+int kvm_arm_init_arch_resources(void)
+{
+	if (system_supports_sve()) {
+		kvm_sve_max_vl = sve_max_virtualisable_vl;
+
+		/*
+		 * The get_sve_reg()/set_sve_reg() ioctl interface will need
+		 * to be extended with multiple register slice support in
+		 * order to support vector lengths greater than
+		 * SVE_VL_ARCH_MAX:
+		 */
+		if (WARN_ON(kvm_sve_max_vl > SVE_VL_ARCH_MAX))
+			kvm_sve_max_vl = SVE_VL_ARCH_MAX;
+
+		/*
+		 * Don't even try to make use of vector lengths that
+		 * aren't available on all CPUs, for now:
+		 */
+		if (kvm_sve_max_vl < sve_max_vl)
+			pr_warn("KVM: SVE vector length for guests limited to %u bytes\n",
+				kvm_sve_max_vl);
+	}
+
+	return 0;
+}
+
+/*
+ * Finalize vcpu's maximum SVE vector length, allocating
+ * vcpu->arch.sve_state as necessary.
+ */
+static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
+{
+	void *buf;
+	unsigned int vl;
+
+	vl = vcpu->arch.sve_max_vl;
+
+	/*
+	 * Resposibility for these properties is shared between
+	 * kvm_arm_init_arch_resources(), kvm_vcpu_enable_sve() and
+	 * set_sve_vls().  Double-check here just to be sure:
+	 */
+	if (WARN_ON(!sve_vl_valid(vl) || vl > sve_max_virtualisable_vl ||
+		    vl > SVE_VL_ARCH_MAX))
+		return -EIO;
+
+	buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	vcpu->arch.sve_state = buf;
+	vcpu->arch.flags |= KVM_ARM64_VCPU_SVE_FINALIZED;
+	return 0;
+}
+
+int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int what)
+{
+	switch (what) {
+	case KVM_ARM_VCPU_SVE:
+		if (!vcpu_has_sve(vcpu))
+			return -EINVAL;
+
+		if (kvm_arm_vcpu_sve_finalized(vcpu))
+			return -EPERM;
+
+		return kvm_vcpu_finalize_sve(vcpu);
+	}
+
+	return -EINVAL;
+}
+
+bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
+{
+	if (vcpu_has_sve(vcpu) && !kvm_arm_vcpu_sve_finalized(vcpu))
+		return false;
+
+	return true;
+}
+
+void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
+{
+	kfree(vcpu->arch.sve_state);
+}
+
 /**
  * kvm_reset_vcpu - sets core registers and sys_regs to reset value
  * @vcpu: The VCPU pointer