Message ID | 1538141967-15375-12-git-send-email-Dave.Martin@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Initial support for SVE guests | expand |
On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > KVM_GET_REG_LIST should only enumerate registers that are actually > accessible, so it is necessary to filter out any register that is > not exposed to the guest. For features that are configured at > runtime, this will require a dynamic check. > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > if SVE is not enabled for the guest. This implies that userspace can never access this interface for a vcpu before having decided whether such features are enabled for the guest or not, since otherwise userspace will see different states for a VCPU depending on sequencing of the API, which sounds fragile to me. That should probably be documented somewhere, and I hope the enable/disable API for SVE in guests already takes that into account. Not sure if there's an action to take here, but it was the best place I could raise this concern. Thanks, Christoffer > > Special-casing walk_one_sys_reg() for specific registers will make > the code unnecessarily messy, so this patch adds a new sysreg > method check_present() that, if defined, indicates whether the > sysreg should be enumerated. If the guest runtime configuration > may require a particular system register to be hidden, > check_present should point to a function that returns true or false > to enable or disable enumeration of that register respectively. > > Currently check_present() is not used for any other purpose, but it > may be a useful foundation for abstracting other parts of the code > to handle conditionally-present sysregs, if required. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > --- > arch/arm64/kvm/sys_regs.c | 10 +++++++--- > arch/arm64/kvm/sys_regs.h | 4 ++++ > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 0dfd064..adb6cbd 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -2437,7 +2437,8 @@ static bool copy_reg_to_user(const struct sys_reg_desc *reg, u64 __user **uind) > return true; > } > > -static int walk_one_sys_reg(const struct sys_reg_desc *rd, > +static int walk_one_sys_reg(const struct kvm_vcpu *vcpu, > + const struct sys_reg_desc *rd, > u64 __user **uind, > unsigned int *total) > { > @@ -2448,6 +2449,9 @@ static int walk_one_sys_reg(const struct sys_reg_desc *rd, > if (!(rd->reg || rd->get_user)) > return 0; > > + if (rd->check_present && !rd->check_present(vcpu, rd)) > + return 0; > + > if (!copy_reg_to_user(rd, uind)) > return -EFAULT; > > @@ -2476,9 +2480,9 @@ static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind) > int cmp = cmp_sys_reg(i1, i2); > /* target-specific overrides generic entry. */ > if (cmp <= 0) > - err = walk_one_sys_reg(i1, &uind, &total); > + err = walk_one_sys_reg(vcpu, i1, &uind, &total); > else > - err = walk_one_sys_reg(i2, &uind, &total); > + err = walk_one_sys_reg(vcpu, i2, &uind, &total); > > if (err) > return err; > diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h > index 24bac06..cffb31e 100644 > --- a/arch/arm64/kvm/sys_regs.h > +++ b/arch/arm64/kvm/sys_regs.h > @@ -61,6 +61,10 @@ struct sys_reg_desc { > const struct kvm_one_reg *reg, void __user *uaddr); > int (*set_user)(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, > const struct kvm_one_reg *reg, void __user *uaddr); > + > + /* Return true iff the register exists; assume present if NULL */ > + bool (*check_present)(const struct kvm_vcpu *vcpu, > + const struct sys_reg_desc *rd); > }; > > static inline void print_sys_reg_instr(const struct sys_reg_params *p) > -- > 2.1.4 > > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > > KVM_GET_REG_LIST should only enumerate registers that are actually > > accessible, so it is necessary to filter out any register that is > > not exposed to the guest. For features that are configured at > > runtime, this will require a dynamic check. > > > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > > if SVE is not enabled for the guest. > > This implies that userspace can never access this interface for a vcpu > before having decided whether such features are enabled for the guest or > not, since otherwise userspace will see different states for a VCPU > depending on sequencing of the API, which sounds fragile to me. > > That should probably be documented somewhere, and I hope the > enable/disable API for SVE in guests already takes that into account. > > Not sure if there's an action to take here, but it was the best place I > could raise this concern. Fair point. I struggled to come up with something better that solves all problems. My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of creating the vcpu, so that if issued at all for a vcpu, it is issued very soon after KVM_VCPU_INIT. I think this worked OK with the current structure of kvmtool and I seem to remember discussing this with Peter Maydell re qemu -- but it sounds like I should double-check. Either way, you're right, this needs to be clearly documented. If we want to be more robust, maybe we should add a capability too, so that userspace that enables this capability promises to call KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN, KVM_GET_REG_LIST etc.) are forbidden until that is done? That should help avoid accidents. I could add a special meaning for an empty kvm_sve_vls, such that it doesn't enable SVE on the affected vcpu. That retains the ability to create heterogeneous guests while still following the above flow. Thoughts? [...] Cheers ---Dave
[Adding Peter and Alex for their view on the QEMU side] On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: > On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: > > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > > > KVM_GET_REG_LIST should only enumerate registers that are actually > > > accessible, so it is necessary to filter out any register that is > > > not exposed to the guest. For features that are configured at > > > runtime, this will require a dynamic check. > > > > > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > > > if SVE is not enabled for the guest. > > > > This implies that userspace can never access this interface for a vcpu > > before having decided whether such features are enabled for the guest or > > not, since otherwise userspace will see different states for a VCPU > > depending on sequencing of the API, which sounds fragile to me. > > > > That should probably be documented somewhere, and I hope the > > enable/disable API for SVE in guests already takes that into account. > > > > Not sure if there's an action to take here, but it was the best place I > > could raise this concern. > > Fair point. I struggled to come up with something better that solves > all problems. > > My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of > creating the vcpu, so that if issued at all for a vcpu, it is issued > very soon after KVM_VCPU_INIT. > > I think this worked OK with the current structure of kvmtool and I > seem to remember discussing this with Peter Maydell re qemu -- but > it sounds like I should double-check. QEMU does some thing around enumerating all the system registers exposed by KVM and saving/restoring them as part of its startup, but I don't remember the exact sequence. > > Either way, you're right, this needs to be clearly documented. > > > If we want to be more robust, maybe we should add a capability too, > so that userspace that enables this capability promises to call > KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN, > KVM_GET_REG_LIST etc.) are forbidden until that is done? > > That should help avoid accidents. > > I could add a special meaning for an empty kvm_sve_vls, such that > it doesn't enable SVE on the affected vcpu. That retains the ability > to create heterogeneous guests while still following the above flow. > I think making sure that userspace can ever only see the same list of available system regiters is going to cause us less pain going forward. If the separate ioctl and capability check is the easiest way of doing that, then I think that sounds good. (I had wished we could have just added some data to KVM_CREATE_VCPU, but that doesn't seem to be the case.) Thanks, Christoffer
On 22 November 2018 at 10:53, Christoffer Dall <christoffer.dall@arm.com> wrote: > [Adding Peter and Alex for their view on the QEMU side] > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of >> creating the vcpu, so that if issued at all for a vcpu, it is issued >> very soon after KVM_VCPU_INIT. >> >> I think this worked OK with the current structure of kvmtool and I >> seem to remember discussing this with Peter Maydell re qemu -- but >> it sounds like I should double-check. > > QEMU does some thing around enumerating all the system registers exposed > by KVM and saving/restoring them as part of its startup, but I don't > remember the exact sequence. This all happens in kvm_arch_init_vcpu(), which does: * KVM_ARM_VCPU_INIT ioctl (with the appropriate kvm_init_features set) * read the guest MPIDR with GET_ONE_REG so we know what KVM is doing with MPIDR assignment across CPUs * check for interesting extensions like KVM_CAP_SET_GUEST_DEBUG * get and cache a list of what system registers the vcpu has, using KVM_GET_REG_LIST. This is where we do the "size must be U32 or U64" sanity check. So if there's something we can't do by setting kvm_init_features for KVM_ARM_VCPU_INIT but have to do immediately afterwards, that is straightforward. The major requirement for QEMU is that if we don't specifically enable SVE in the VCPU then we must not see any registers in the KVM_GET_REG_LIST that are not u32 or u64 -- otherwise QEMU will refuse to start. thanks -- PMM
Christoffer Dall <christoffer.dall@arm.com> writes: > [Adding Peter and Alex for their view on the QEMU side] > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: >> > > KVM_GET_REG_LIST should only enumerate registers that are actually >> > > accessible, so it is necessary to filter out any register that is >> > > not exposed to the guest. For features that are configured at >> > > runtime, this will require a dynamic check. >> > > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden >> > > if SVE is not enabled for the guest. >> > >> > This implies that userspace can never access this interface for a vcpu >> > before having decided whether such features are enabled for the guest or >> > not, since otherwise userspace will see different states for a VCPU >> > depending on sequencing of the API, which sounds fragile to me. >> > >> > That should probably be documented somewhere, and I hope the >> > enable/disable API for SVE in guests already takes that into account. >> > >> > Not sure if there's an action to take here, but it was the best place I >> > could raise this concern. >> >> Fair point. I struggled to come up with something better that solves >> all problems. >> >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of >> creating the vcpu, so that if issued at all for a vcpu, it is issued >> very soon after KVM_VCPU_INIT. >> >> I think this worked OK with the current structure of kvmtool and I >> seem to remember discussing this with Peter Maydell re qemu -- but >> it sounds like I should double-check. > > QEMU does some thing around enumerating all the system registers exposed > by KVM and saving/restoring them as part of its startup, but I don't > remember the exact sequence. QEMU does this for each vCPU as part of it's start-up sequence: kvm_init_vcpu kvm_get_cpu (-> KVM_CREATE_VCPU) KVM_GET_VCPU_MMAP_SIZE kvm_arch_init_vcpu kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT) kvm_get_one_reg(ARM_CPU_ID_MPIDR) kvm_arm_init_debug (chk for KVM_CAP SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS) kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR) kvm_arm_init_cpreg_list (KVM_GET_REG_LIST) At this point we have the register list we need for kvm_arch_get_registers which is what we call every time we want to synchronise state. We only really do this for debug events, crashes and at some point when migrating. > >> >> Either way, you're right, this needs to be clearly documented. >> >> >> If we want to be more robust, maybe we should add a capability too, >> so that userspace that enables this capability promises to call >> KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN, >> KVM_GET_REG_LIST etc.) are forbidden until that is done? >> >> That should help avoid accidents. >> >> I could add a special meaning for an empty kvm_sve_vls, such that >> it doesn't enable SVE on the affected vcpu. That retains the ability >> to create heterogeneous guests while still following the above flow. >> > I think making sure that userspace can ever only see the same list of > available system regiters is going to cause us less pain going forward. > > If the separate ioctl and capability check is the easiest way of doing > that, then I think that sounds good. (I had wished we could have just > added some data to KVM_CREATE_VCPU, but that doesn't seem to be the > case.) > > > Thanks, > > Christoffer -- Alex Bennée
On Thu, Nov 22, 2018 at 11:27:53AM +0000, Alex Bennée wrote: > > Christoffer Dall <christoffer.dall@arm.com> writes: > > > [Adding Peter and Alex for their view on the QEMU side] > > > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: > >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: > >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > >> > > KVM_GET_REG_LIST should only enumerate registers that are actually > >> > > accessible, so it is necessary to filter out any register that is > >> > > not exposed to the guest. For features that are configured at > >> > > runtime, this will require a dynamic check. > >> > > > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > >> > > if SVE is not enabled for the guest. > >> > > >> > This implies that userspace can never access this interface for a vcpu > >> > before having decided whether such features are enabled for the guest or > >> > not, since otherwise userspace will see different states for a VCPU > >> > depending on sequencing of the API, which sounds fragile to me. > >> > > >> > That should probably be documented somewhere, and I hope the > >> > enable/disable API for SVE in guests already takes that into account. > >> > > >> > Not sure if there's an action to take here, but it was the best place I > >> > could raise this concern. > >> > >> Fair point. I struggled to come up with something better that solves > >> all problems. > >> > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of > >> creating the vcpu, so that if issued at all for a vcpu, it is issued > >> very soon after KVM_VCPU_INIT. > >> > >> I think this worked OK with the current structure of kvmtool and I > >> seem to remember discussing this with Peter Maydell re qemu -- but > >> it sounds like I should double-check. > > > > QEMU does some thing around enumerating all the system registers exposed > > by KVM and saving/restoring them as part of its startup, but I don't > > remember the exact sequence. > > QEMU does this for each vCPU as part of it's start-up sequence: > > kvm_init_vcpu > kvm_get_cpu (-> KVM_CREATE_VCPU) > KVM_GET_VCPU_MMAP_SIZE > kvm_arch_init_vcpu > kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT) > kvm_get_one_reg(ARM_CPU_ID_MPIDR) > kvm_arm_init_debug (chk for KVM_CAP SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS) > kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR) > kvm_arm_init_cpreg_list (KVM_GET_REG_LIST) > > At this point we have the register list we need for > kvm_arch_get_registers which is what we call every time we want to > synchronise state. We only really do this for debug events, crashes and > at some point when migrating. So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence, meaning that the new capability is not strictly necessary. I sympathise with Christoffer's view though that without the capability mechanism it may be too easy for software to make mistakes: code refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls over and then things would go wrong with no immediate error indication. In effect, the SVE regs would be missing from the list yielded by KVM_GET_REG_LIST, possibly leading to silent migration failures. I'm a bit uneasy about that. Am I being too paranoid now? Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Thu, Nov 22, 2018 at 11:13:51AM +0000, Peter Maydell wrote: > On 22 November 2018 at 10:53, Christoffer Dall <christoffer.dall@arm.com> wrote: > > [Adding Peter and Alex for their view on the QEMU side] > > > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of > >> creating the vcpu, so that if issued at all for a vcpu, it is issued > >> very soon after KVM_VCPU_INIT. > >> > >> I think this worked OK with the current structure of kvmtool and I > >> seem to remember discussing this with Peter Maydell re qemu -- but > >> it sounds like I should double-check. > > > > QEMU does some thing around enumerating all the system registers exposed > > by KVM and saving/restoring them as part of its startup, but I don't > > remember the exact sequence. > > This all happens in kvm_arch_init_vcpu(), which does: > * KVM_ARM_VCPU_INIT ioctl (with the appropriate kvm_init_features set) > * read the guest MPIDR with GET_ONE_REG so we know what KVM > is doing with MPIDR assignment across CPUs > * check for interesting extensions like KVM_CAP_SET_GUEST_DEBUG > * get and cache a list of what system registers the vcpu has, > using KVM_GET_REG_LIST. This is where we do the "size must > be U32 or U64" sanity check. > > So if there's something we can't do by setting kvm_init_features > for KVM_ARM_VCPU_INIT but have to do immediately afterwards, > that is straightforward. > > The major requirement for QEMU is that if we don't specifically > enable SVE in the VCPU then we must not see any registers > in the KVM_GET_REG_LIST that are not u32 or u64 -- otherwise > QEMU will refuse to start. > So on migration, will you have the required information for KVM_ARM_VCPU_INIT before setting the registers from the migration stream? (I assume so, because presumably this comes from a command-line switch or from the machine definition, which must match the source.) Therefore, I don't think there's an issue with this patch, but from bitter experience I think we should enforce ordering if possible. Thanks, Christoffer
On 22 November 2018 at 12:34, Christoffer Dall <christoffer.dall@arm.com> wrote: > So on migration, will you have the required information for > KVM_ARM_VCPU_INIT before setting the registers from the migration > stream? > > (I assume so, because presumably this comes from a command-line switch > or from the machine definition, which must match the source.) Yes. QEMU always sets up the VCPU completely before doing any inbound migration. > Therefore, I don't think there's an issue with this patch, but from > bitter experience I think we should enforce ordering if possible. Yes, if there are semantic ordering constraints on the various calls it would be nice to have the kernel enforce them. thanks -- PMM
On Thu, Nov 22, 2018 at 01:32:37PM +0100, Dave P Martin wrote: > On Thu, Nov 22, 2018 at 11:27:53AM +0000, Alex Bennée wrote: > > > > Christoffer Dall <christoffer.dall@arm.com> writes: > > > > > [Adding Peter and Alex for their view on the QEMU side] > > > > > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: > > >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: > > >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > > >> > > KVM_GET_REG_LIST should only enumerate registers that are actually > > >> > > accessible, so it is necessary to filter out any register that is > > >> > > not exposed to the guest. For features that are configured at > > >> > > runtime, this will require a dynamic check. > > >> > > > > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > > >> > > if SVE is not enabled for the guest. > > >> > > > >> > This implies that userspace can never access this interface for a vcpu > > >> > before having decided whether such features are enabled for the guest or > > >> > not, since otherwise userspace will see different states for a VCPU > > >> > depending on sequencing of the API, which sounds fragile to me. > > >> > > > >> > That should probably be documented somewhere, and I hope the > > >> > enable/disable API for SVE in guests already takes that into account. > > >> > > > >> > Not sure if there's an action to take here, but it was the best place I > > >> > could raise this concern. > > >> > > >> Fair point. I struggled to come up with something better that solves > > >> all problems. > > >> > > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of > > >> creating the vcpu, so that if issued at all for a vcpu, it is issued > > >> very soon after KVM_VCPU_INIT. > > >> > > >> I think this worked OK with the current structure of kvmtool and I > > >> seem to remember discussing this with Peter Maydell re qemu -- but > > >> it sounds like I should double-check. > > > > > > QEMU does some thing around enumerating all the system registers exposed > > > by KVM and saving/restoring them as part of its startup, but I don't > > > remember the exact sequence. > > > > QEMU does this for each vCPU as part of it's start-up sequence: > > > > kvm_init_vcpu > > kvm_get_cpu (-> KVM_CREATE_VCPU) > > KVM_GET_VCPU_MMAP_SIZE > > kvm_arch_init_vcpu > > kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT) > > kvm_get_one_reg(ARM_CPU_ID_MPIDR) > > kvm_arm_init_debug (chk for KVM_CAP SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS) > > kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR) > > kvm_arm_init_cpreg_list (KVM_GET_REG_LIST) > > > > At this point we have the register list we need for > > kvm_arch_get_registers which is what we call every time we want to > > synchronise state. We only really do this for debug events, crashes and > > at some point when migrating. > > So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence, > meaning that the new capability is not strictly necessary. > > I sympathise with Christoffer's view though that without the capability > mechanism it may be too easy for software to make mistakes: code > refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls > over and then things would go wrong with no immediate error indication. > > In effect, the SVE regs would be missing from the list yielded by > KVM_GET_REG_LIST, possibly leading to silent migration failures. > > I'm a bit uneasy about that. Am I being too paranoid now? > No, we've made decisions in the past where we didn't enforce ordering which ended up being a huge pain (vgic lazy init, as a clear example of something really bad). Of course, it's a tradeoff. If it's a huge pain to implement, maybe things will be ok, but if it's just a read/write capability handshake, I think it's worth doing. Thanks, Christoffer
On Thu, Nov 22, 2018 at 02:07:18PM +0100, Christoffer Dall wrote: > On Thu, Nov 22, 2018 at 01:32:37PM +0100, Dave P Martin wrote: > > On Thu, Nov 22, 2018 at 11:27:53AM +0000, Alex Bennée wrote: > > > > > > Christoffer Dall <christoffer.dall@arm.com> writes: > > > > > > > [Adding Peter and Alex for their view on the QEMU side] > > > > > > > > On Thu, Nov 15, 2018 at 05:27:11PM +0000, Dave Martin wrote: > > > >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote: > > > >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote: > > > >> > > KVM_GET_REG_LIST should only enumerate registers that are actually > > > >> > > accessible, so it is necessary to filter out any register that is > > > >> > > not exposed to the guest. For features that are configured at > > > >> > > runtime, this will require a dynamic check. > > > >> > > > > > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden > > > >> > > if SVE is not enabled for the guest. > > > >> > > > > >> > This implies that userspace can never access this interface for a vcpu > > > >> > before having decided whether such features are enabled for the guest or > > > >> > not, since otherwise userspace will see different states for a VCPU > > > >> > depending on sequencing of the API, which sounds fragile to me. > > > >> > > > > >> > That should probably be documented somewhere, and I hope the > > > >> > enable/disable API for SVE in guests already takes that into account. > > > >> > > > > >> > Not sure if there's an action to take here, but it was the best place I > > > >> > could raise this concern. > > > >> > > > >> Fair point. I struggled to come up with something better that solves > > > >> all problems. > > > >> > > > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of > > > >> creating the vcpu, so that if issued at all for a vcpu, it is issued > > > >> very soon after KVM_VCPU_INIT. > > > >> > > > >> I think this worked OK with the current structure of kvmtool and I > > > >> seem to remember discussing this with Peter Maydell re qemu -- but > > > >> it sounds like I should double-check. > > > > > > > > QEMU does some thing around enumerating all the system registers exposed > > > > by KVM and saving/restoring them as part of its startup, but I don't > > > > remember the exact sequence. > > > > > > QEMU does this for each vCPU as part of it's start-up sequence: > > > > > > kvm_init_vcpu > > > kvm_get_cpu (-> KVM_CREATE_VCPU) > > > KVM_GET_VCPU_MMAP_SIZE > > > kvm_arch_init_vcpu > > > kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT) > > > kvm_get_one_reg(ARM_CPU_ID_MPIDR) > > > kvm_arm_init_debug (chk for KVM_CAP SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS) > > > kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR) > > > kvm_arm_init_cpreg_list (KVM_GET_REG_LIST) > > > > > > At this point we have the register list we need for > > > kvm_arch_get_registers which is what we call every time we want to > > > synchronise state. We only really do this for debug events, crashes and > > > at some point when migrating. > > > > So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence, > > meaning that the new capability is not strictly necessary. > > > > I sympathise with Christoffer's view though that without the capability > > mechanism it may be too easy for software to make mistakes: code > > refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls > > over and then things would go wrong with no immediate error indication. > > > > In effect, the SVE regs would be missing from the list yielded by > > KVM_GET_REG_LIST, possibly leading to silent migration failures. > > > > I'm a bit uneasy about that. Am I being too paranoid now? > > > > No, we've made decisions in the past where we didn't enforce ordering > which ended up being a huge pain (vgic lazy init, as a clear example of > something really bad). Of course, it's a tradeoff. If it's a huge pain > to implement, maybe things will be ok, but if it's just a read/write > capability handshake, I think it's worth doing. OK, I'll add the capability and enforcement in the respin. We never came up with a way to extent KVM_VCPU_INIT that felt 100% right for this case, and a capability is straightforward to reason about, even if it's a bit clunky. Cheers ---Dave
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 0dfd064..adb6cbd 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2437,7 +2437,8 @@ static bool copy_reg_to_user(const struct sys_reg_desc *reg, u64 __user **uind) return true; } -static int walk_one_sys_reg(const struct sys_reg_desc *rd, +static int walk_one_sys_reg(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd, u64 __user **uind, unsigned int *total) { @@ -2448,6 +2449,9 @@ static int walk_one_sys_reg(const struct sys_reg_desc *rd, if (!(rd->reg || rd->get_user)) return 0; + if (rd->check_present && !rd->check_present(vcpu, rd)) + return 0; + if (!copy_reg_to_user(rd, uind)) return -EFAULT; @@ -2476,9 +2480,9 @@ static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind) int cmp = cmp_sys_reg(i1, i2); /* target-specific overrides generic entry. */ if (cmp <= 0) - err = walk_one_sys_reg(i1, &uind, &total); + err = walk_one_sys_reg(vcpu, i1, &uind, &total); else - err = walk_one_sys_reg(i2, &uind, &total); + err = walk_one_sys_reg(vcpu, i2, &uind, &total); if (err) return err; diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h index 24bac06..cffb31e 100644 --- a/arch/arm64/kvm/sys_regs.h +++ b/arch/arm64/kvm/sys_regs.h @@ -61,6 +61,10 @@ struct sys_reg_desc { const struct kvm_one_reg *reg, void __user *uaddr); int (*set_user)(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, const struct kvm_one_reg *reg, void __user *uaddr); + + /* Return true iff the register exists; assume present if NULL */ + bool (*check_present)(const struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd); }; static inline void print_sys_reg_instr(const struct sys_reg_params *p)
KVM_GET_REG_LIST should only enumerate registers that are actually accessible, so it is necessary to filter out any register that is not exposed to the guest. For features that are configured at runtime, this will require a dynamic check. For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden if SVE is not enabled for the guest. Special-casing walk_one_sys_reg() for specific registers will make the code unnecessarily messy, so this patch adds a new sysreg method check_present() that, if defined, indicates whether the sysreg should be enumerated. If the guest runtime configuration may require a particular system register to be hidden, check_present should point to a function that returns true or false to enable or disable enumeration of that register respectively. Currently check_present() is not used for any other purpose, but it may be a useful foundation for abstracting other parts of the code to handle conditionally-present sysregs, if required. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/kvm/sys_regs.c | 10 +++++++--- arch/arm64/kvm/sys_regs.h | 4 ++++ 2 files changed, 11 insertions(+), 3 deletions(-)