diff mbox series

[v5,02/10] KVM: arm64: Setup a framework for hypercall bitmap firmware registers

Message ID 20220407011605.1966778-3-rananta@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Add support for hypercall services selection | expand

Commit Message

Raghavendra Rao Ananta April 7, 2022, 1:15 a.m. UTC
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.

As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware psuedo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.

These firmware registers are categorized based on the service call
owners, but unlike the existing firmware psuedo-registers, they hold
the features supported in the form of a bitmap.

During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and writeback the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.

Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the hvc_func_default_allowed_list[]. Any
function-id that's not enabled via the bitmap, or not listed in
hvc_func_default_allowed_list[], will be returned as
SMCCC_RET_NOT_SUPPORTED to the guest.

Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.

In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  12 ++++
 arch/arm64/include/uapi/asm/kvm.h |   9 +++
 arch/arm64/kvm/arm.c              |   1 +
 arch/arm64/kvm/guest.c            |   8 ++-
 arch/arm64/kvm/hypercalls.c       | 102 ++++++++++++++++++++++++++++++
 include/kvm/arm_hypercalls.h      |   7 ++
 include/kvm/arm_psci.h            |  12 ++++
 7 files changed, 149 insertions(+), 2 deletions(-)

Comments

Marc Zyngier April 7, 2022, 9:06 a.m. UTC | #1
Hi Raghavendra,

On Thu, 07 Apr 2022 02:15:57 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> KVM regularly introduces new hypercall services to the guests without
> any consent from the userspace. This means, the guests can observe
> hypercall services in and out as they migrate across various host
> kernel versions. This could be a major problem if the guest
> discovered a hypercall, started using it, and after getting migrated
> to an older kernel realizes that it's no longer available. Depending
> on how the guest handles the change, there's a potential chance that
> the guest would just panic.
> 
> As a result, there's a need for the userspace to elect the services
> that it wishes the guest to discover. It can elect these services
> based on the kernels spread across its (migration) fleet. To remedy
> this, extend the existing firmware psuedo-registers, such as

nit: pseudo

> KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> for all the hypercall services available.
> 
> These firmware registers are categorized based on the service call
> owners, but unlike the existing firmware psuedo-registers, they hold

nit: pseudo again

> the features supported in the form of a bitmap.
> 
> During the VM initialization, the registers are set to upper-limit of
> the features supported by the corresponding registers. It's expected
> that the VMMs discover the features provided by each register via
> GET_ONE_REG, and writeback the desired values using SET_ONE_REG.

nit: write back

> KVM allows this modification only until the VM has started.
> 
> Some of the standard features are not mapped to any bits of the
> registers. But since they can recreate the original problem of
> making it available without userspace's consent, they need to
> be explicitly added to the hvc_func_default_allowed_list[]. Any
> function-id that's not enabled via the bitmap, or not listed in
> hvc_func_default_allowed_list[], will be returned as
> SMCCC_RET_NOT_SUPPORTED to the guest.
> 
> Older userspace code can simply ignore the feature and the
> hypercall services will be exposed unconditionally to the guests,
> thus ensuring backward compatibility.
> 
> In this patch, the framework adds the register only for ARM's standard
> secure services (owner value 4). Currently, this includes support only
> for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> register representing mandatory features of v1.0. Other services are
> momentarily added in the upcoming patches.
> 
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  12 ++++
>  arch/arm64/include/uapi/asm/kvm.h |   9 +++
>  arch/arm64/kvm/arm.c              |   1 +
>  arch/arm64/kvm/guest.c            |   8 ++-
>  arch/arm64/kvm/hypercalls.c       | 102 ++++++++++++++++++++++++++++++
>  include/kvm/arm_hypercalls.h      |   7 ++
>  include/kvm/arm_psci.h            |  12 ++++
>  7 files changed, 149 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e3b25dc6c367..6e663383d7b4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
>  struct kvm_arch_memory_slot {
>  };
>  
> +/**
> + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> + *
> + * @std_bmap: Bitmap of standard secure service calls
> + */
> +struct kvm_smccc_features {
> +	u64 std_bmap;

Consider using 'unsigned long' for bitmaps.

> +};
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -140,6 +149,9 @@ struct kvm_arch {
>  
>  	u8 pfr0_csv2;
>  	u8 pfr0_csv3;
> +
> +	/* Hypercall features firmware registers' descriptor */
> +	struct kvm_smccc_features smccc_feat;
>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index c1b6ddc02d2f..56e4bc58a355 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
>  #define KVM_ARM64_SVE_VLS_WORDS	\
>  	((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
>  
> +/* Bitmap feature firmware registers */
> +#define KVM_REG_ARM_FW_FEAT_BMAP		(0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r)		(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> +						KVM_REG_ARM_FW_FEAT_BMAP |	\
> +						((r) & 0xffff))
> +
> +#define KVM_REG_ARM_STD_BMAP			KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0		BIT(0)

I'm really in two minds about this. Having one bit per service is easy
from an implementation perspective, but is also means that this
disallow fine grained control over which hypercalls are actually
available. If tomorrow TRNG 1.1 adds a new hypercall and that KVM
implements both, how does the selection mechanism works? You will
need a version selector (a la PSCI), which defeats this API somehow
(and renders the name of the #define invalid).

I wonder if a more correct way to look at this is to enumerate the
hypercalls themselves (all 5 of them), though coming up with an
encoding is tricky (RNG32 and RNG64 would clash, for example).

Thoughts?

> +
>  /* Device Control API: ARM VGIC */
>  #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
>  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 523bc934fe2f..a37fadbd617e 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
>  
>  	set_default_spectre(kvm);
> +	kvm_arm_init_hypercalls(kvm);
>  
>  	return ret;
>  out_free_stage2_pgd:
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 0d5cca56cbda..8c607199cad1 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  
>  	switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
>  	case KVM_REG_ARM_CORE:	return get_core_reg(vcpu, reg);
> -	case KVM_REG_ARM_FW:	return kvm_arm_get_fw_reg(vcpu, reg);
> +	case KVM_REG_ARM_FW:
> +	case KVM_REG_ARM_FW_FEAT_BMAP:
> +		return kvm_arm_get_fw_reg(vcpu, reg);
>  	case KVM_REG_ARM64_SVE:	return get_sve_reg(vcpu, reg);
>  	}
>  
> @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  
>  	switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
>  	case KVM_REG_ARM_CORE:	return set_core_reg(vcpu, reg);
> -	case KVM_REG_ARM_FW:	return kvm_arm_set_fw_reg(vcpu, reg);
> +	case KVM_REG_ARM_FW:
> +	case KVM_REG_ARM_FW_FEAT_BMAP:
> +		return kvm_arm_set_fw_reg(vcpu, reg);
>  	case KVM_REG_ARM64_SVE:	return set_sve_reg(vcpu, reg);
>  	}
>  
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index fa6d9378d8e7..cf04b5ee5f56 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -58,6 +58,53 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
>  	val[3] = lower_32_bits(cycles);
>  }
>  
> +/*
> + * List of function-ids that are not gated with the bitmapped feature
> + * firmware registers, and are to be allowed for servicing the call by default.
> + */
> +static const u32 hvc_func_default_allowed_list[] = {
> +	ARM_SMCCC_VERSION_FUNC_ID,
> +	ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> +	ARM_SMCCC_HV_PV_TIME_FEATURES,
> +	ARM_SMCCC_HV_PV_TIME_ST,
> +	ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID,
> +	ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
> +	ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
> +};
> +
> +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(hvc_func_default_allowed_list); i++)
> +		if (func_id == hvc_func_default_allowed_list[i])
> +			return true;

Huh, this really is ugly. This array is bound to become bigger over
time, meaning that the average hypercall time is going to increase. At
the very least, this should be turned into a switch/case statement, as
the compile is pretty good at building a search tree (better than this
naive loop, for a start), and we have those everywhere else.

> +
> +	return kvm_psci_func_id_is_valid(vcpu, func_id);
> +}
> +
> +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> +{
> +	return reg_bmap & feat_bit;
> +}

We really don't need to reimplement test_bit().

> +
> +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> +	struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> +
> +	switch (func_id) {
> +	case ARM_SMCCC_TRNG_VERSION:
> +	case ARM_SMCCC_TRNG_FEATURES:
> +	case ARM_SMCCC_TRNG_GET_UUID:
> +	case ARM_SMCCC_TRNG_RND32:
> +	case ARM_SMCCC_TRNG_RND64:
> +		return kvm_arm_fw_reg_feat_enabled(smccc_feat->std_bmap,
> +						KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> +	default:
> +		return kvm_hvc_call_default_allowed(vcpu, func_id);
> +	}
> +}
> +
>  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  {
>  	u32 func_id = smccc_get_function(vcpu);
> @@ -65,6 +112,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  	u32 feature;
>  	gpa_t gpa;
>  
> +	if (!kvm_hvc_call_allowed(vcpu, func_id))
> +		goto out;
> +
>  	switch (func_id) {
>  	case ARM_SMCCC_VERSION_FUNC_ID:
>  		val[0] = ARM_SMCCC_VERSION_1_1;
> @@ -155,6 +205,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
>  		return kvm_psci_call(vcpu);
>  	}
>  
> +out:
>  	smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
>  	return 1;
>  }
> @@ -164,8 +215,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
>  	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
>  	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
>  	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> +	KVM_REG_ARM_STD_BMAP,
>  };
>  
> +void kvm_arm_init_hypercalls(struct kvm *kvm)
> +{
> +	struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> +
> +	smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> +}
> +
>  int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
>  {
>  	return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> @@ -237,6 +296,7 @@ static int get_kernel_wa_level(u64 regid)
>  
>  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  {
> +	struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
>  	void __user *uaddr = (void __user *)(long)reg->addr;
>  	u64 val;
>  
> @@ -249,6 +309,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
>  		val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
>  		break;
> +	case KVM_REG_ARM_STD_BMAP:
> +		val = READ_ONCE(smccc_feat->std_bmap);
> +		break;
>  	default:
>  		return -ENOENT;
>  	}
> @@ -259,6 +322,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  	return 0;
>  }
>  
> +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> +{
> +	int ret = 0;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> +	u64 *fw_reg_bmap, fw_reg_features;
> +
> +	switch (reg_id) {
> +	case KVM_REG_ARM_STD_BMAP:
> +		fw_reg_bmap = &smccc_feat->std_bmap;
> +		fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> +		break;
> +	default:
> +		return -ENOENT;
> +	}
> +
> +	/* Check for unsupported bit */
> +	if (val & ~fw_reg_features)
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * If the VM (any vCPU) has already started running, return success
> +	 * if there's no change in the value. Else, return -EBUSY.

No, this should *always* fail if a vcpu has started. Otherwise, you
start allowing hard to spot races.

> +	 */
> +	if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> +		ret = *fw_reg_bmap != val ? -EBUSY : 0;
> +		goto out;
> +	}
> +
> +	WRITE_ONCE(*fw_reg_bmap, val);

I'm not sure what this WRITE_ONCE guards against. Do you expect
concurrent reads at this stage?

> +out:
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
>  int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  {
>  	void __user *uaddr = (void __user *)(long)reg->addr;
> @@ -337,6 +437,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  			return -EINVAL;
>  
>  		return 0;
> +	case KVM_REG_ARM_STD_BMAP:
> +		return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
>  	default:
>  		return -ENOENT;
>  	}
> diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> index 5d38628a8d04..fd3ff350ee9d 100644
> --- a/include/kvm/arm_hypercalls.h
> +++ b/include/kvm/arm_hypercalls.h
> @@ -6,6 +6,12 @@
>  
>  #include <asm/kvm_emulate.h>
>  
> +/* Last valid bits of the bitmapped firmware registers */
> +#define KVM_REG_ARM_STD_BMAP_BIT_MAX		0
> +
> +#define KVM_ARM_SMCCC_STD_FEATURES \
> +	GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> +
>  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
>  
>  static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> @@ -42,6 +48,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
>  
>  struct kvm_one_reg;
>  
> +void kvm_arm_init_hypercalls(struct kvm *kvm);
>  int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
>  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> index 6e55b9283789..d7a87367de56 100644
> --- a/include/kvm/arm_psci.h
> +++ b/include/kvm/arm_psci.h
> @@ -36,6 +36,18 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
>  	return KVM_ARM_PSCI_0_1;
>  }
>  
> +static inline bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> +{
> +	/* PSCI 0.1 doesn't comply with the standard SMCCC */
> +	if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> +		return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> +
> +	if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> +		ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> +		return true;
> +
> +	return false;
> +}

Why the inline function? Do you expect this to be shared with
something else? If not, I'd rather you move it into the caller.

>  
>  int kvm_psci_call(struct kvm_vcpu *vcpu);
>  

Thanks,

	M.
Raghavendra Rao Ananta April 7, 2022, 5:24 p.m. UTC | #2
Hi Marc,

On Thu, Apr 7, 2022 at 2:07 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Raghavendra,
>
> On Thu, 07 Apr 2022 02:15:57 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > KVM regularly introduces new hypercall services to the guests without
> > any consent from the userspace. This means, the guests can observe
> > hypercall services in and out as they migrate across various host
> > kernel versions. This could be a major problem if the guest
> > discovered a hypercall, started using it, and after getting migrated
> > to an older kernel realizes that it's no longer available. Depending
> > on how the guest handles the change, there's a potential chance that
> > the guest would just panic.
> >
> > As a result, there's a need for the userspace to elect the services
> > that it wishes the guest to discover. It can elect these services
> > based on the kernels spread across its (migration) fleet. To remedy
> > this, extend the existing firmware psuedo-registers, such as
>
> nit: pseudo
>
> > KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
> > for all the hypercall services available.
> >
> > These firmware registers are categorized based on the service call
> > owners, but unlike the existing firmware psuedo-registers, they hold
>
> nit: pseudo again
>
> > the features supported in the form of a bitmap.
> >
> > During the VM initialization, the registers are set to upper-limit of
> > the features supported by the corresponding registers. It's expected
> > that the VMMs discover the features provided by each register via
> > GET_ONE_REG, and writeback the desired values using SET_ONE_REG.
>
> nit: write back
>
> > KVM allows this modification only until the VM has started.
> >
> > Some of the standard features are not mapped to any bits of the
> > registers. But since they can recreate the original problem of
> > making it available without userspace's consent, they need to
> > be explicitly added to the hvc_func_default_allowed_list[]. Any
> > function-id that's not enabled via the bitmap, or not listed in
> > hvc_func_default_allowed_list[], will be returned as
> > SMCCC_RET_NOT_SUPPORTED to the guest.
> >
> > Older userspace code can simply ignore the feature and the
> > hypercall services will be exposed unconditionally to the guests,
> > thus ensuring backward compatibility.
> >
> > In this patch, the framework adds the register only for ARM's standard
> > secure services (owner value 4). Currently, this includes support only
> > for ARM True Random Number Generator (TRNG) service, with bit-0 of the
> > register representing mandatory features of v1.0. Other services are
> > momentarily added in the upcoming patches.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  12 ++++
> >  arch/arm64/include/uapi/asm/kvm.h |   9 +++
> >  arch/arm64/kvm/arm.c              |   1 +
> >  arch/arm64/kvm/guest.c            |   8 ++-
> >  arch/arm64/kvm/hypercalls.c       | 102 ++++++++++++++++++++++++++++++
> >  include/kvm/arm_hypercalls.h      |   7 ++
> >  include/kvm/arm_psci.h            |  12 ++++
> >  7 files changed, 149 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index e3b25dc6c367..6e663383d7b4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -101,6 +101,15 @@ struct kvm_s2_mmu {
> >  struct kvm_arch_memory_slot {
> >  };
> >
> > +/**
> > + * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
> > + *
> > + * @std_bmap: Bitmap of standard secure service calls
> > + */
> > +struct kvm_smccc_features {
> > +     u64 std_bmap;
>
> Consider using 'unsigned long' for bitmaps.
>
Sure.

> > +};
> > +
> >  struct kvm_arch {
> >       struct kvm_s2_mmu mmu;
> >
> > @@ -140,6 +149,9 @@ struct kvm_arch {
> >
> >       u8 pfr0_csv2;
> >       u8 pfr0_csv3;
> > +
> > +     /* Hypercall features firmware registers' descriptor */
> > +     struct kvm_smccc_features smccc_feat;
> >  };
> >
> >  struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index c1b6ddc02d2f..56e4bc58a355 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -332,6 +332,15 @@ struct kvm_arm_copy_mte_tags {
> >  #define KVM_ARM64_SVE_VLS_WORDS      \
> >       ((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
> >
> > +/* Bitmap feature firmware registers */
> > +#define KVM_REG_ARM_FW_FEAT_BMAP             (0x0016 << KVM_REG_ARM_COPROC_SHIFT)
> > +#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r)              (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> > +                                             KVM_REG_ARM_FW_FEAT_BMAP |      \
> > +                                             ((r) & 0xffff))
> > +
> > +#define KVM_REG_ARM_STD_BMAP                 KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
> > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0                BIT(0)
>
> I'm really in two minds about this. Having one bit per service is easy
> from an implementation perspective, but is also means that this
> disallow fine grained control over which hypercalls are actually
> available. If tomorrow TRNG 1.1 adds a new hypercall and that KVM
> implements both, how does the selection mechanism works? You will
> need a version selector (a la PSCI), which defeats this API somehow
> (and renders the name of the #define invalid).
>
> I wonder if a more correct way to look at this is to enumerate the
> hypercalls themselves (all 5 of them), though coming up with an
> encoding is tricky (RNG32 and RNG64 would clash, for example).
>
> Thoughts?
>
I was on the fence about this too. The TRNG spec (ARM DEN 0098,
Table-4) mentions that v1.0 should have VERSION, FEATURES, GET_UUID,
and RND as mandatory features. Hence, if KVM advertised that it
supports TRNG v1.0, I thought it would be best to expose all or
nothing of v1.0 by guarding them with a single bit.
Broadly, the idea is to have a bit per version. If v1.1 comes along,
we can have another bit for that. If it's not too ugly to implement,
we can be a little more aggressive and ensure that userspace doesn't
enable v1.1 without enabling v1.0.

> > +
> >  /* Device Control API: ARM VGIC */
> >  #define KVM_DEV_ARM_VGIC_GRP_ADDR    0
> >  #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS       1
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 523bc934fe2f..a37fadbd617e 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -156,6 +156,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >       kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> >
> >       set_default_spectre(kvm);
> > +     kvm_arm_init_hypercalls(kvm);
> >
> >       return ret;
> >  out_free_stage2_pgd:
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 0d5cca56cbda..8c607199cad1 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -756,7 +756,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> >       switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> >       case KVM_REG_ARM_CORE:  return get_core_reg(vcpu, reg);
> > -     case KVM_REG_ARM_FW:    return kvm_arm_get_fw_reg(vcpu, reg);
> > +     case KVM_REG_ARM_FW:
> > +     case KVM_REG_ARM_FW_FEAT_BMAP:
> > +             return kvm_arm_get_fw_reg(vcpu, reg);
> >       case KVM_REG_ARM64_SVE: return get_sve_reg(vcpu, reg);
> >       }
> >
> > @@ -774,7 +776,9 @@ int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >
> >       switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
> >       case KVM_REG_ARM_CORE:  return set_core_reg(vcpu, reg);
> > -     case KVM_REG_ARM_FW:    return kvm_arm_set_fw_reg(vcpu, reg);
> > +     case KVM_REG_ARM_FW:
> > +     case KVM_REG_ARM_FW_FEAT_BMAP:
> > +             return kvm_arm_set_fw_reg(vcpu, reg);
> >       case KVM_REG_ARM64_SVE: return set_sve_reg(vcpu, reg);
> >       }
> >
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index fa6d9378d8e7..cf04b5ee5f56 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -58,6 +58,53 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> >       val[3] = lower_32_bits(cycles);
> >  }
> >
> > +/*
> > + * List of function-ids that are not gated with the bitmapped feature
> > + * firmware registers, and are to be allowed for servicing the call by default.
> > + */
> > +static const u32 hvc_func_default_allowed_list[] = {
> > +     ARM_SMCCC_VERSION_FUNC_ID,
> > +     ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> > +     ARM_SMCCC_HV_PV_TIME_FEATURES,
> > +     ARM_SMCCC_HV_PV_TIME_ST,
> > +     ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID,
> > +     ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
> > +     ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
> > +};
> > +
> > +static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > +     unsigned int i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(hvc_func_default_allowed_list); i++)
> > +             if (func_id == hvc_func_default_allowed_list[i])
> > +                     return true;
>
> Huh, this really is ugly. This array is bound to become bigger over
> time, meaning that the average hypercall time is going to increase. At
> the very least, this should be turned into a switch/case statement, as
> the compile is pretty good at building a search tree (better than this
> naive loop, for a start), and we have those everywhere else.
>
Makes sense. I'll make it a switch-case.

> > +
> > +     return kvm_psci_func_id_is_valid(vcpu, func_id);
> > +}
> > +
> > +static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
> > +{
> > +     return reg_bmap & feat_bit;
> > +}
>
> We really don't need to reimplement test_bit().
>
Right, I forgot about test_bit() :)

> > +
> > +static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > +     struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> > +
> > +     switch (func_id) {
> > +     case ARM_SMCCC_TRNG_VERSION:
> > +     case ARM_SMCCC_TRNG_FEATURES:
> > +     case ARM_SMCCC_TRNG_GET_UUID:
> > +     case ARM_SMCCC_TRNG_RND32:
> > +     case ARM_SMCCC_TRNG_RND64:
> > +             return kvm_arm_fw_reg_feat_enabled(smccc_feat->std_bmap,
> > +                                             KVM_REG_ARM_STD_BIT_TRNG_V1_0);
> > +     default:
> > +             return kvm_hvc_call_default_allowed(vcpu, func_id);
> > +     }
> > +}
> > +
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >  {
> >       u32 func_id = smccc_get_function(vcpu);
> > @@ -65,6 +112,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >       u32 feature;
> >       gpa_t gpa;
> >
> > +     if (!kvm_hvc_call_allowed(vcpu, func_id))
> > +             goto out;
> > +
> >       switch (func_id) {
> >       case ARM_SMCCC_VERSION_FUNC_ID:
> >               val[0] = ARM_SMCCC_VERSION_1_1;
> > @@ -155,6 +205,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >               return kvm_psci_call(vcpu);
> >       }
> >
> > +out:
> >       smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> >       return 1;
> >  }
> > @@ -164,8 +215,16 @@ static const u64 kvm_arm_fw_reg_ids[] = {
> >       KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
> >       KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
> >       KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
> > +     KVM_REG_ARM_STD_BMAP,
> >  };
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm)
> > +{
> > +     struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > +
> > +     smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
> > +}
> > +
> >  int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
> >  {
> >       return ARRAY_SIZE(kvm_arm_fw_reg_ids);
> > @@ -237,6 +296,7 @@ static int get_kernel_wa_level(u64 regid)
> >
> >  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >  {
> > +     struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
> >       void __user *uaddr = (void __user *)(long)reg->addr;
> >       u64 val;
> >
> > @@ -249,6 +309,9 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >       case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
> >               val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
> >               break;
> > +     case KVM_REG_ARM_STD_BMAP:
> > +             val = READ_ONCE(smccc_feat->std_bmap);
> > +             break;
> >       default:
> >               return -ENOENT;
> >       }
> > @@ -259,6 +322,43 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >       return 0;
> >  }
> >
> > +static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
> > +{
> > +     int ret = 0;
> > +     struct kvm *kvm = vcpu->kvm;
> > +     struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
> > +     u64 *fw_reg_bmap, fw_reg_features;
> > +
> > +     switch (reg_id) {
> > +     case KVM_REG_ARM_STD_BMAP:
> > +             fw_reg_bmap = &smccc_feat->std_bmap;
> > +             fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
> > +             break;
> > +     default:
> > +             return -ENOENT;
> > +     }
> > +
> > +     /* Check for unsupported bit */
> > +     if (val & ~fw_reg_features)
> > +             return -EINVAL;
> > +
> > +     mutex_lock(&kvm->lock);
> > +
> > +     /*
> > +      * If the VM (any vCPU) has already started running, return success
> > +      * if there's no change in the value. Else, return -EBUSY.
>
> No, this should *always* fail if a vcpu has started. Otherwise, you
> start allowing hard to spot races.
>
The idea came from the fact that userspace could spawn multiple
threads to configure the vCPU registers. Since we don't have the
VM-scoped registers yet, it may be possible that userspace has issued
a KVM_RUN on one of the vCPU, while the others are lagging behind and
still configuring the registers. The slower threads may see -EBUSY and
could panic. But if you feel that it's an overkill and the userspace
should deal with it, we can return EBUSY for all writes after KVM_RUN.

> > +      */
> > +     if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > +             ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > +             goto out;
> > +     }
> > +
> > +     WRITE_ONCE(*fw_reg_bmap, val);
>
> I'm not sure what this WRITE_ONCE guards against. Do you expect
> concurrent reads at this stage?
>
Again, the assumption here is that userspace could have multiple
threads reading and writing to these registers. Without the VM scoped
registers in place, we may end up with a read/write to the same memory
location for all the vCPUs.

> > +out:
> > +     mutex_unlock(&kvm->lock);
> > +     return ret;
> > +}
> > +
> >  int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >  {
> >       void __user *uaddr = (void __user *)(long)reg->addr;
> > @@ -337,6 +437,8 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >                       return -EINVAL;
> >
> >               return 0;
> > +     case KVM_REG_ARM_STD_BMAP:
> > +             return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
> >       default:
> >               return -ENOENT;
> >       }
> > diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
> > index 5d38628a8d04..fd3ff350ee9d 100644
> > --- a/include/kvm/arm_hypercalls.h
> > +++ b/include/kvm/arm_hypercalls.h
> > @@ -6,6 +6,12 @@
> >
> >  #include <asm/kvm_emulate.h>
> >
> > +/* Last valid bits of the bitmapped firmware registers */
> > +#define KVM_REG_ARM_STD_BMAP_BIT_MAX         0
> > +
> > +#define KVM_ARM_SMCCC_STD_FEATURES \
> > +     GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
> > +
> >  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> >
> >  static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
> > @@ -42,6 +48,7 @@ static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
> >
> >  struct kvm_one_reg;
> >
> > +void kvm_arm_init_hypercalls(struct kvm *kvm);
> >  int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> >  int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> >  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> > diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
> > index 6e55b9283789..d7a87367de56 100644
> > --- a/include/kvm/arm_psci.h
> > +++ b/include/kvm/arm_psci.h
> > @@ -36,6 +36,18 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
> >       return KVM_ARM_PSCI_0_1;
> >  }
> >
> > +static inline bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
> > +{
> > +     /* PSCI 0.1 doesn't comply with the standard SMCCC */
> > +     if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
> > +             return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
> > +
> > +     if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
> > +             ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
> > +             return true;
> > +
> > +     return false;
> > +}
>
> Why the inline function? Do you expect this to be shared with
> something else? If not, I'd rather you move it into the caller.
>
Well, no plans to share as of yet. Will move it to psci.c

> >
> >  int kvm_psci_call(struct kvm_vcpu *vcpu);
> >
>

Thanks for the review, Marc. I'll fix all the other nits.

Regards,
Raghavendra

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
Marc Zyngier April 8, 2022, 4:59 p.m. UTC | #3
On Thu, 07 Apr 2022 18:24:14 +0100,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Hi Marc,
> 
> > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0                BIT(0)
> >
> > I'm really in two minds about this. Having one bit per service is easy
> > from an implementation perspective, but is also means that this
> > disallow fine grained control over which hypercalls are actually
> > available. If tomorrow TRNG 1.1 adds a new hypercall and that KVM
> > implements both, how does the selection mechanism works? You will
> > need a version selector (a la PSCI), which defeats this API somehow
> > (and renders the name of the #define invalid).
> >
> > I wonder if a more correct way to look at this is to enumerate the
> > hypercalls themselves (all 5 of them), though coming up with an
> > encoding is tricky (RNG32 and RNG64 would clash, for example).
> >
> > Thoughts?
> >
> I was on the fence about this too. The TRNG spec (ARM DEN 0098,
> Table-4) mentions that v1.0 should have VERSION, FEATURES, GET_UUID,
> and RND as mandatory features. Hence, if KVM advertised that it
> supports TRNG v1.0, I thought it would be best to expose all or
> nothing of v1.0 by guarding them with a single bit.
> Broadly, the idea is to have a bit per version. If v1.1 comes along,
> we can have another bit for that. If it's not too ugly to implement,
> we can be a little more aggressive and ensure that userspace doesn't
> enable v1.1 without enabling v1.0.

OK, that'd be assuming that we'll never see a service where version A
is incompatible with version B and that we have to exclude one or the
other. Meh. Let's cross that bridge once it is actually built.

[...]

> > > +     mutex_lock(&kvm->lock);
> > > +
> > > +     /*
> > > +      * If the VM (any vCPU) has already started running, return success
> > > +      * if there's no change in the value. Else, return -EBUSY.
> >
> > No, this should *always* fail if a vcpu has started. Otherwise, you
> > start allowing hard to spot races.
> >
> The idea came from the fact that userspace could spawn multiple
> threads to configure the vCPU registers. Since we don't have the
> VM-scoped registers yet, it may be possible that userspace has issued
> a KVM_RUN on one of the vCPU, while the others are lagging behind and
> still configuring the registers. The slower threads may see -EBUSY and
> could panic. But if you feel that it's an overkill and the userspace
> should deal with it, we can return EBUSY for all writes after KVM_RUN.

I'd rather have that. There already is stuff that rely on things not
changing once a vcpu has run, so I'd rather be consistent.

>
> > > +      */
> > > +     if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > > +             ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > > +             goto out;
> > > +     }
> > > +
> > > +     WRITE_ONCE(*fw_reg_bmap, val);
> >
> > I'm not sure what this WRITE_ONCE guards against. Do you expect
> > concurrent reads at this stage?
> >
> Again, the assumption here is that userspace could have multiple
> threads reading and writing to these registers. Without the VM scoped
> registers in place, we may end up with a read/write to the same memory
> location for all the vCPUs.

We only have one vcpu updating this at any given time (that's what the
lock ensures). A simple write should be OK, as far as I can tell.

Thanks,

	M.
Raghavendra Rao Ananta April 8, 2022, 5:34 p.m. UTC | #4
On Fri, Apr 8, 2022 at 9:59 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 07 Apr 2022 18:24:14 +0100,
> Raghavendra Rao Ananta <rananta@google.com> wrote:
> >
> > Hi Marc,
> >
> > > > +#define KVM_REG_ARM_STD_BIT_TRNG_V1_0                BIT(0)
> > >
> > > I'm really in two minds about this. Having one bit per service is easy
> > > from an implementation perspective, but is also means that this
> > > disallow fine grained control over which hypercalls are actually
> > > available. If tomorrow TRNG 1.1 adds a new hypercall and that KVM
> > > implements both, how does the selection mechanism works? You will
> > > need a version selector (a la PSCI), which defeats this API somehow
> > > (and renders the name of the #define invalid).
> > >
> > > I wonder if a more correct way to look at this is to enumerate the
> > > hypercalls themselves (all 5 of them), though coming up with an
> > > encoding is tricky (RNG32 and RNG64 would clash, for example).
> > >
> > > Thoughts?
> > >
> > I was on the fence about this too. The TRNG spec (ARM DEN 0098,
> > Table-4) mentions that v1.0 should have VERSION, FEATURES, GET_UUID,
> > and RND as mandatory features. Hence, if KVM advertised that it
> > supports TRNG v1.0, I thought it would be best to expose all or
> > nothing of v1.0 by guarding them with a single bit.
> > Broadly, the idea is to have a bit per version. If v1.1 comes along,
> > we can have another bit for that. If it's not too ugly to implement,
> > we can be a little more aggressive and ensure that userspace doesn't
> > enable v1.1 without enabling v1.0.
>
> OK, that'd be assuming that we'll never see a service where version A
> is incompatible with version B and that we have to exclude one or the
> other. Meh. Let's cross that bridge once it is actually built.
>
> [...]
>
> > > > +     mutex_lock(&kvm->lock);
> > > > +
> > > > +     /*
> > > > +      * If the VM (any vCPU) has already started running, return success
> > > > +      * if there's no change in the value. Else, return -EBUSY.
> > >
> > > No, this should *always* fail if a vcpu has started. Otherwise, you
> > > start allowing hard to spot races.
> > >
> > The idea came from the fact that userspace could spawn multiple
> > threads to configure the vCPU registers. Since we don't have the
> > VM-scoped registers yet, it may be possible that userspace has issued
> > a KVM_RUN on one of the vCPU, while the others are lagging behind and
> > still configuring the registers. The slower threads may see -EBUSY and
> > could panic. But if you feel that it's an overkill and the userspace
> > should deal with it, we can return EBUSY for all writes after KVM_RUN.
>
> I'd rather have that. There already is stuff that rely on things not
> changing once a vcpu has run, so I'd rather be consistent.
>
Sure, I'll return EBUSY if the VM has started regardless of the incoming value.
> >
> > > > +      */
> > > > +     if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
> > > > +             ret = *fw_reg_bmap != val ? -EBUSY : 0;
> > > > +             goto out;
> > > > +     }
> > > > +
> > > > +     WRITE_ONCE(*fw_reg_bmap, val);
> > >
> > > I'm not sure what this WRITE_ONCE guards against. Do you expect
> > > concurrent reads at this stage?
> > >
> > Again, the assumption here is that userspace could have multiple
> > threads reading and writing to these registers. Without the VM scoped
> > registers in place, we may end up with a read/write to the same memory
> > location for all the vCPUs.
>
> We only have one vcpu updating this at any given time (that's what the
> lock ensures). A simple write should be OK, as far as I can tell.
>
I agree that a write against another write should be fine without the
WRITE_ONCE. But my little concern was this write against a read
(unsure how userspace accesses these registers). I'm guessing it
shouldn't hurt to keep them in place, no? :)

Regards,
Raghavendra

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e3b25dc6c367..6e663383d7b4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,6 +101,15 @@  struct kvm_s2_mmu {
 struct kvm_arch_memory_slot {
 };
 
+/**
+ * struct kvm_smccc_features: Descriptor the hypercall services exposed to the guests
+ *
+ * @std_bmap: Bitmap of standard secure service calls
+ */
+struct kvm_smccc_features {
+	u64 std_bmap;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -140,6 +149,9 @@  struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+
+	/* Hypercall features firmware registers' descriptor */
+	struct kvm_smccc_features smccc_feat;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c1b6ddc02d2f..56e4bc58a355 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -332,6 +332,15 @@  struct kvm_arm_copy_mte_tags {
 #define KVM_ARM64_SVE_VLS_WORDS	\
 	((KVM_ARM64_SVE_VQ_MAX - KVM_ARM64_SVE_VQ_MIN) / 64 + 1)
 
+/* Bitmap feature firmware registers */
+#define KVM_REG_ARM_FW_FEAT_BMAP		(0x0016 << KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_FW_FEAT_BMAP_REG(r)		(KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
+						KVM_REG_ARM_FW_FEAT_BMAP |	\
+						((r) & 0xffff))
+
+#define KVM_REG_ARM_STD_BMAP			KVM_REG_ARM_FW_FEAT_BMAP_REG(0)
+#define KVM_REG_ARM_STD_BIT_TRNG_V1_0		BIT(0)
+
 /* Device Control API: ARM VGIC */
 #define KVM_DEV_ARM_VGIC_GRP_ADDR	0
 #define KVM_DEV_ARM_VGIC_GRP_DIST_REGS	1
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 523bc934fe2f..a37fadbd617e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -156,6 +156,7 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
 
 	set_default_spectre(kvm);
+	kvm_arm_init_hypercalls(kvm);
 
 	return ret;
 out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0d5cca56cbda..8c607199cad1 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -756,7 +756,9 @@  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 
 	switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
 	case KVM_REG_ARM_CORE:	return get_core_reg(vcpu, reg);
-	case KVM_REG_ARM_FW:	return kvm_arm_get_fw_reg(vcpu, reg);
+	case KVM_REG_ARM_FW:
+	case KVM_REG_ARM_FW_FEAT_BMAP:
+		return kvm_arm_get_fw_reg(vcpu, reg);
 	case KVM_REG_ARM64_SVE:	return get_sve_reg(vcpu, reg);
 	}
 
@@ -774,7 +776,9 @@  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 
 	switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
 	case KVM_REG_ARM_CORE:	return set_core_reg(vcpu, reg);
-	case KVM_REG_ARM_FW:	return kvm_arm_set_fw_reg(vcpu, reg);
+	case KVM_REG_ARM_FW:
+	case KVM_REG_ARM_FW_FEAT_BMAP:
+		return kvm_arm_set_fw_reg(vcpu, reg);
 	case KVM_REG_ARM64_SVE:	return set_sve_reg(vcpu, reg);
 	}
 
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index fa6d9378d8e7..cf04b5ee5f56 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -58,6 +58,53 @@  static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
 	val[3] = lower_32_bits(cycles);
 }
 
+/*
+ * List of function-ids that are not gated with the bitmapped feature
+ * firmware registers, and are to be allowed for servicing the call by default.
+ */
+static const u32 hvc_func_default_allowed_list[] = {
+	ARM_SMCCC_VERSION_FUNC_ID,
+	ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+	ARM_SMCCC_HV_PV_TIME_FEATURES,
+	ARM_SMCCC_HV_PV_TIME_ST,
+	ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID,
+	ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID,
+	ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID,
+};
+
+static bool kvm_hvc_call_default_allowed(struct kvm_vcpu *vcpu, u32 func_id)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(hvc_func_default_allowed_list); i++)
+		if (func_id == hvc_func_default_allowed_list[i])
+			return true;
+
+	return kvm_psci_func_id_is_valid(vcpu, func_id);
+}
+
+static bool kvm_arm_fw_reg_feat_enabled(u64 reg_bmap, u64 feat_bit)
+{
+	return reg_bmap & feat_bit;
+}
+
+static bool kvm_hvc_call_allowed(struct kvm_vcpu *vcpu, u32 func_id)
+{
+	struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
+
+	switch (func_id) {
+	case ARM_SMCCC_TRNG_VERSION:
+	case ARM_SMCCC_TRNG_FEATURES:
+	case ARM_SMCCC_TRNG_GET_UUID:
+	case ARM_SMCCC_TRNG_RND32:
+	case ARM_SMCCC_TRNG_RND64:
+		return kvm_arm_fw_reg_feat_enabled(smccc_feat->std_bmap,
+						KVM_REG_ARM_STD_BIT_TRNG_V1_0);
+	default:
+		return kvm_hvc_call_default_allowed(vcpu, func_id);
+	}
+}
+
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
 	u32 func_id = smccc_get_function(vcpu);
@@ -65,6 +112,9 @@  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	u32 feature;
 	gpa_t gpa;
 
+	if (!kvm_hvc_call_allowed(vcpu, func_id))
+		goto out;
+
 	switch (func_id) {
 	case ARM_SMCCC_VERSION_FUNC_ID:
 		val[0] = ARM_SMCCC_VERSION_1_1;
@@ -155,6 +205,7 @@  int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		return kvm_psci_call(vcpu);
 	}
 
+out:
 	smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
 	return 1;
 }
@@ -164,8 +215,16 @@  static const u64 kvm_arm_fw_reg_ids[] = {
 	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
 	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
 	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3,
+	KVM_REG_ARM_STD_BMAP,
 };
 
+void kvm_arm_init_hypercalls(struct kvm *kvm)
+{
+	struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
+
+	smccc_feat->std_bmap = KVM_ARM_SMCCC_STD_FEATURES;
+}
+
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
 	return ARRAY_SIZE(kvm_arm_fw_reg_ids);
@@ -237,6 +296,7 @@  static int get_kernel_wa_level(u64 regid)
 
 int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
+	struct kvm_smccc_features *smccc_feat = &vcpu->kvm->arch.smccc_feat;
 	void __user *uaddr = (void __user *)(long)reg->addr;
 	u64 val;
 
@@ -249,6 +309,9 @@  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_3:
 		val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
 		break;
+	case KVM_REG_ARM_STD_BMAP:
+		val = READ_ONCE(smccc_feat->std_bmap);
+		break;
 	default:
 		return -ENOENT;
 	}
@@ -259,6 +322,43 @@  int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	return 0;
 }
 
+static int kvm_arm_set_fw_reg_bmap(struct kvm_vcpu *vcpu, u64 reg_id, u64 val)
+{
+	int ret = 0;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_smccc_features *smccc_feat = &kvm->arch.smccc_feat;
+	u64 *fw_reg_bmap, fw_reg_features;
+
+	switch (reg_id) {
+	case KVM_REG_ARM_STD_BMAP:
+		fw_reg_bmap = &smccc_feat->std_bmap;
+		fw_reg_features = KVM_ARM_SMCCC_STD_FEATURES;
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	/* Check for unsupported bit */
+	if (val & ~fw_reg_features)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	/*
+	 * If the VM (any vCPU) has already started running, return success
+	 * if there's no change in the value. Else, return -EBUSY.
+	 */
+	if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) {
+		ret = *fw_reg_bmap != val ? -EBUSY : 0;
+		goto out;
+	}
+
+	WRITE_ONCE(*fw_reg_bmap, val);
+out:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
 int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	void __user *uaddr = (void __user *)(long)reg->addr;
@@ -337,6 +437,8 @@  int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return -EINVAL;
 
 		return 0;
+	case KVM_REG_ARM_STD_BMAP:
+		return kvm_arm_set_fw_reg_bmap(vcpu, reg->id, val);
 	default:
 		return -ENOENT;
 	}
diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h
index 5d38628a8d04..fd3ff350ee9d 100644
--- a/include/kvm/arm_hypercalls.h
+++ b/include/kvm/arm_hypercalls.h
@@ -6,6 +6,12 @@ 
 
 #include <asm/kvm_emulate.h>
 
+/* Last valid bits of the bitmapped firmware registers */
+#define KVM_REG_ARM_STD_BMAP_BIT_MAX		0
+
+#define KVM_ARM_SMCCC_STD_FEATURES \
+	GENMASK_ULL(KVM_REG_ARM_STD_BMAP_BIT_MAX, 0)
+
 int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
 
 static inline u32 smccc_get_function(struct kvm_vcpu *vcpu)
@@ -42,6 +48,7 @@  static inline void smccc_set_retval(struct kvm_vcpu *vcpu,
 
 struct kvm_one_reg;
 
+void kvm_arm_init_hypercalls(struct kvm *kvm);
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 6e55b9283789..d7a87367de56 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -36,6 +36,18 @@  static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 	return KVM_ARM_PSCI_0_1;
 }
 
+static inline bool kvm_psci_func_id_is_valid(struct kvm_vcpu *vcpu, u32 func_id)
+{
+	/* PSCI 0.1 doesn't comply with the standard SMCCC */
+	if (kvm_psci_version(vcpu) == KVM_ARM_PSCI_0_1)
+		return (func_id == KVM_PSCI_FN_CPU_OFF || func_id == KVM_PSCI_FN_CPU_ON);
+
+	if (ARM_SMCCC_OWNER_NUM(func_id) == ARM_SMCCC_OWNER_STANDARD &&
+		ARM_SMCCC_FUNC_NUM(func_id) >= 0 && ARM_SMCCC_FUNC_NUM(func_id) <= 0x1f)
+		return true;
+
+	return false;
+}
 
 int kvm_psci_call(struct kvm_vcpu *vcpu);