Message ID | 20190802074620.115029-8-anup.patel@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM RISC-V Support | expand |
On 02/08/19 09:47, Anup Patel wrote: > + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) > + kvm_riscv_vcpu_flush_interrupts(vcpu, false); Not updating the vsip CSR here can cause an interrupt to be lost, if the next call to kvm_riscv_vcpu_flush_interrupts finds a zero mask. You could add a new field vcpu->vsip_shadow that is updated every time CSR_VSIP is written (including kvm_arch_vcpu_load) with a function like void kvm_riscv_update_vsip(struct kvm_vcpu *vcpu) { if (vcpu->vsip_shadow != vcpu->arch.guest_csr.vsip) { csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; } } And just call this unconditionally from kvm_vcpu_ioctl_run. The cost is just a memory load per VS-mode entry, it should hardly be measurable. Paolo
On Fri, Aug 2, 2019 at 2:33 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 02/08/19 09:47, Anup Patel wrote: > > + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) > > + kvm_riscv_vcpu_flush_interrupts(vcpu, false); > > Not updating the vsip CSR here can cause an interrupt to be lost, if the > next call to kvm_riscv_vcpu_flush_interrupts finds a zero mask. Thanks for catching this issue. I will address it in v3. If we think more on similar lines then we also need to handle the case where Guest VCPU had pending interrupts and we suddenly stopped it for Guest migration. In this case, we would eventually use SET_ONE_REG ioctl on destination Host which should set vsip_shadow instead of vsip so that we force update HW after resuming Guest VCPU on destination host. > > You could add a new field vcpu->vsip_shadow that is updated every time > CSR_VSIP is written (including kvm_arch_vcpu_load) with a function like > > void kvm_riscv_update_vsip(struct kvm_vcpu *vcpu) > { > if (vcpu->vsip_shadow != vcpu->arch.guest_csr.vsip) { > csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); > vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; > } > } > > And just call this unconditionally from kvm_vcpu_ioctl_run. The cost is > just a memory load per VS-mode entry, it should hardly be measurable. > I think we can do this at start of kvm_riscv_vcpu_flush_interrupts() as well. Regards, Anup
On 05/08/19 08:55, Anup Patel wrote: > On Fri, Aug 2, 2019 at 2:33 PM Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 02/08/19 09:47, Anup Patel wrote: >>> + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) >>> + kvm_riscv_vcpu_flush_interrupts(vcpu, false); >> >> Not updating the vsip CSR here can cause an interrupt to be lost, if the >> next call to kvm_riscv_vcpu_flush_interrupts finds a zero mask. > > Thanks for catching this issue. I will address it in v3. > > If we think more on similar lines then we also need to handle the case > where Guest VCPU had pending interrupts and we suddenly stopped it > for Guest migration. In this case, we would eventually use SET_ONE_REG > ioctl on destination Host which should set vsip_shadow instead of vsip so > that we force update HW after resuming Guest VCPU on destination host. I think it's simpler than that. vcpu->vsip_shadow is just the current value of CSR_VSIP so that you do not need to update it unconditionally on every vmentry. That is, kvm_vcpu_arch_load should do csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; while every other write can go through kvm_riscv_update_vsip. But vsip_shadow is completely disconnected from SET_ONE_REG; SET_ONE_REG can just write vcpu->arch.guest_csr.vsip and clear irqs_pending_mask, the next entry will write CSR_VSIP and vsip_shadow if needed. In fact, instead of placing it in kvm_vcpu, vsip_shadow could be a percpu variable; on hardware_enable you write 0 to both vsip_shadow and CSR_VSIP, and then kvm_arch_vcpu_load does not have to touch CSR_VSIP at all (only kvm_riscv_vcpu_flush_interrupts). I think this makes the purpose of vsip_shadow even clearer, so I highly suggest doing that. >> You could add a new field vcpu->vsip_shadow that is updated every time >> CSR_VSIP is written (including kvm_arch_vcpu_load) with a function like >> >> void kvm_riscv_update_vsip(struct kvm_vcpu *vcpu) >> { >> if (vcpu->vsip_shadow != vcpu->arch.guest_csr.vsip) { >> csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); >> vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; >> } >> } >> >> And just call this unconditionally from kvm_vcpu_ioctl_run. The cost is >> just a memory load per VS-mode entry, it should hardly be measurable. > > I think we can do this at start of kvm_riscv_vcpu_flush_interrupts() as well. Did you mean at the end? (That is, after modifying vcpu->arch.guest_csr.vsip based on mask and val). With the above switch to percpu, the only write of CSR_VSIP and vsip_shadow should be in kvm_riscv_vcpu_flush_interrupts, which in turn is only called from kvm_vcpu_ioctl_run. Thanks, Paolo
On Mon, Aug 5, 2019 at 12:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 05/08/19 08:55, Anup Patel wrote: > > On Fri, Aug 2, 2019 at 2:33 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > >> On 02/08/19 09:47, Anup Patel wrote: > >>> + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) > >>> + kvm_riscv_vcpu_flush_interrupts(vcpu, false); > >> > >> Not updating the vsip CSR here can cause an interrupt to be lost, if the > >> next call to kvm_riscv_vcpu_flush_interrupts finds a zero mask. > > > > Thanks for catching this issue. I will address it in v3. > > > > If we think more on similar lines then we also need to handle the case > > where Guest VCPU had pending interrupts and we suddenly stopped it > > for Guest migration. In this case, we would eventually use SET_ONE_REG > > ioctl on destination Host which should set vsip_shadow instead of vsip so > > that we force update HW after resuming Guest VCPU on destination host. > > I think it's simpler than that. > > vcpu->vsip_shadow is just the current value of CSR_VSIP so that you do > not need to update it unconditionally on every vmentry. That is, > kvm_vcpu_arch_load should do > > csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); > vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; > > while every other write can go through kvm_riscv_update_vsip. But > vsip_shadow is completely disconnected from SET_ONE_REG; SET_ONE_REG can > just write vcpu->arch.guest_csr.vsip and clear irqs_pending_mask, the > next entry will write CSR_VSIP and vsip_shadow if needed. > > In fact, instead of placing it in kvm_vcpu, vsip_shadow could be a > percpu variable; on hardware_enable you write 0 to both vsip_shadow and > CSR_VSIP, and then kvm_arch_vcpu_load does not have to touch CSR_VSIP at > all (only kvm_riscv_vcpu_flush_interrupts). I think this makes the > purpose of vsip_shadow even clearer, so I highly suggest doing that. Yes, having vsip_shadow as percpu variable makes sense. I will update accordingly. > > >> You could add a new field vcpu->vsip_shadow that is updated every time > >> CSR_VSIP is written (including kvm_arch_vcpu_load) with a function like > >> > >> void kvm_riscv_update_vsip(struct kvm_vcpu *vcpu) > >> { > >> if (vcpu->vsip_shadow != vcpu->arch.guest_csr.vsip) { > >> csr_write(CSR_VSIP, vcpu->arch.guest_csr.vsip); > >> vcpu->vsip_shadow = vcpu->arch.guest_csr.vsip; > >> } > >> } > >> > >> And just call this unconditionally from kvm_vcpu_ioctl_run. The cost is > >> just a memory load per VS-mode entry, it should hardly be measurable. > > > > I think we can do this at start of kvm_riscv_vcpu_flush_interrupts() as well. > > Did you mean at the end? (That is, after modifying > vcpu->arch.guest_csr.vsip based on mask and val). With the above switch > to percpu, the only write of CSR_VSIP and vsip_shadow should be in > kvm_riscv_vcpu_flush_interrupts, which in turn is only called from > kvm_vcpu_ioctl_run. Yes, I meant at the end of kvm_riscv_vcpu_flush_interrupts() but I am fine having separate kvm_riscv_update_vsip() function as well. Regards, Anup
On 05/08/19 13:00, Anup Patel wrote: >>> I think we can do this at start of kvm_riscv_vcpu_flush_interrupts() as well. >> Did you mean at the end? (That is, after modifying >> vcpu->arch.guest_csr.vsip based on mask and val). With the above switch >> to percpu, the only write of CSR_VSIP and vsip_shadow should be in >> kvm_riscv_vcpu_flush_interrupts, which in turn is only called from >> kvm_vcpu_ioctl_run. > Yes, I meant at the end of kvm_riscv_vcpu_flush_interrupts() but I am > fine having separate kvm_riscv_update_vsip() function as well. At end is certainly fine for me. Paolo
On 02.08.19 09:47, Anup Patel wrote: > For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access > VCPU config and registers from user-space. > > We have three types of VCPU registers: > 1. CONFIG - these are VCPU config and capabilities > 2. CORE - these are VCPU general purpose registers > 3. CSR - these are VCPU control and status registers > > The CONFIG registers available to user-space are ISA and TIMEBASE. Out > of these, TIMEBASE is a read-only register which inform user-space about > VCPU timer base frequency. The ISA register is a read and write register > where user-space can only write the desired VCPU ISA capabilities before > running the VCPU. > > The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7, > T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except > PC and MODE. The PC register represents program counter whereas the MODE > register represent VCPU privilege mode (i.e. S/U-mode). > > The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC, > SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers. > > In future, more VCPU register types will be added (such as FP) for the > KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls. While have ONE_REG will certainly work, have you considered the sync_reg scheme (registers as part of kvm_run structure) This will speed up the exit to QEMU as you do not have to do multiple ioctls (each imposing a systemcall overhead) for one exit. Ideally you should not exit too often into qemu, but for those cases sync_regs is faster than ONE_REG.
On Mon, Aug 5, 2019 at 5:08 PM Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > On 02.08.19 09:47, Anup Patel wrote: > > For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access > > VCPU config and registers from user-space. > > > > We have three types of VCPU registers: > > 1. CONFIG - these are VCPU config and capabilities > > 2. CORE - these are VCPU general purpose registers > > 3. CSR - these are VCPU control and status registers > > > > The CONFIG registers available to user-space are ISA and TIMEBASE. Out > > of these, TIMEBASE is a read-only register which inform user-space about > > VCPU timer base frequency. The ISA register is a read and write register > > where user-space can only write the desired VCPU ISA capabilities before > > running the VCPU. > > > > The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7, > > T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except > > PC and MODE. The PC register represents program counter whereas the MODE > > register represent VCPU privilege mode (i.e. S/U-mode). > > > > The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC, > > SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers. > > > > In future, more VCPU register types will be added (such as FP) for the > > KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls. > > While have ONE_REG will certainly work, have you considered the sync_reg scheme > (registers as part of kvm_run structure) > This will speed up the exit to QEMU as you do not have to do multiple ioctls > (each imposing a systemcall overhead) for one exit. > > Ideally you should not exit too often into qemu, but for those cases sync_regs > is faster than ONE_REG. > We will certainly explore sync_regs interface. Reducing exits to user-space will definitely help. This is the first series for KVM RISC-V so here we want to establish a stable and extensible UAPI header using which we will add support to QEMU KVM. For time being, we are using KVMTOOL for debug and development. Thanks, Anup
On 05/08/19 13:37, Christian Borntraeger wrote: > While have ONE_REG will certainly work, have you considered the sync_reg scheme > (registers as part of kvm_run structure) > This will speed up the exit to QEMU as you do not have to do multiple ioctls > (each imposing a systemcall overhead) for one exit. > > Ideally you should not exit too often into qemu, but for those cases sync_regs > is faster than ONE_REG. At least in theory, RISC-V should never have exits to QEMU that need accessing the registers. (The same is true for x86; Google implemented sync_regs because they moved the instruction emulator to userspace in their fork). Paolo
On 05/08/19 13:56, Anup Patel wrote: > We will certainly explore sync_regs interface. Reducing exits to user-space > will definitely help. sync_regs does not reduce exits to userspace, it reduces ioctls from userspace but there is a real benefit only if userspace actually makes many syscalls for each vmexit. Paolo
On Mon, Aug 5, 2019 at 5:31 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 05/08/19 13:56, Anup Patel wrote: > > We will certainly explore sync_regs interface. Reducing exits to user-space > > will definitely help. > > sync_regs does not reduce exits to userspace, it reduces ioctls from > userspace but there is a real benefit only if userspace actually makes > many syscalls for each vmexit. Thanks, got it. Regards, Anup
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h index 6dbc056d58ba..024f220eb17e 100644 --- a/arch/riscv/include/uapi/asm/kvm.h +++ b/arch/riscv/include/uapi/asm/kvm.h @@ -23,8 +23,15 @@ /* for KVM_GET_REGS and KVM_SET_REGS */ struct kvm_regs { + /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ + struct user_regs_struct regs; + unsigned long mode; }; +/* Possible privilege modes for kvm_regs */ +#define KVM_RISCV_MODE_S 1 +#define KVM_RISCV_MODE_U 0 + /* for KVM_GET_FPU and KVM_SET_FPU */ struct kvm_fpu { }; @@ -41,10 +48,41 @@ struct kvm_guest_debug_arch { struct kvm_sync_regs { }; -/* dummy definition */ +/* for KVM_GET_SREGS and KVM_SET_SREGS */ struct kvm_sregs { + unsigned long sstatus; + unsigned long sie; + unsigned long stvec; + unsigned long sscratch; + unsigned long sepc; + unsigned long scause; + unsigned long stval; + unsigned long sip; + unsigned long satp; }; +#define KVM_REG_SIZE(id) \ + (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) + +/* If you need to interpret the index values, here is the key: */ +#define KVM_REG_RISCV_TYPE_MASK 0x00000000FF000000 +#define KVM_REG_RISCV_TYPE_SHIFT 24 + +/* Config registers are mapped as type 1 */ +#define KVM_REG_RISCV_CONFIG (0x01 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CONFIG_ISA 0x0 +#define KVM_REG_RISCV_CONFIG_TIMEBASE 0x1 + +/* Core registers are mapped as type 2 */ +#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CORE_REG(name) \ + (offsetof(struct kvm_regs, name) / sizeof(unsigned long)) + +/* Control and status registers are mapped as type 3 */ +#define KVM_REG_RISCV_CSR (0x03 << KVM_REG_RISCV_TYPE_SHIFT) +#define KVM_REG_RISCV_CSR_REG(name) \ + (offsetof(struct kvm_sregs, name) / sizeof(unsigned long)) + #endif #endif /* __LINUX_KVM_RISCV_H */ diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 8e1ebdf1ef15..c6877d9e229a 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -161,6 +161,215 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CONFIG); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + switch (reg_num) { + case KVM_REG_RISCV_CONFIG_ISA: + reg_val = vcpu->arch.isa; + break; + case KVM_REG_RISCV_CONFIG_TIMEBASE: + reg_val = riscv_timebase; + break; + default: + return -EINVAL; + }; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CONFIG); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + switch (reg_num) { + case KVM_REG_RISCV_CONFIG_ISA: + if (!vcpu->arch.ran_atleast_once) { + vcpu->arch.isa = reg_val; + vcpu->arch.isa &= riscv_isa; + vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED; + } else { + return -ENOTSUPP; + } + break; + case KVM_REG_RISCV_CONFIG_TIMEBASE: + return -ENOTSUPP; + default: + return -EINVAL; + }; + + return 0; +} + +static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CORE); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) + reg_val = cntx->sepc; + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) + reg_val = ((unsigned long *)cntx)[reg_num]; + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) + reg_val = (cntx->sstatus & SR_SPP) ? + KVM_RISCV_MODE_S : KVM_RISCV_MODE_U; + else + return -EINVAL; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CORE); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) + cntx->sepc = reg_val; + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) + ((unsigned long *)cntx)[reg_num] = reg_val; + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) { + if (reg_val == KVM_RISCV_MODE_S) + cntx->sstatus |= SR_SPP; + else + cntx->sstatus &= ~SR_SPP; + } else + return -EINVAL; + + return 0; +} + +static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CSR); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long)) + return -EINVAL; + + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) + kvm_riscv_vcpu_flush_interrupts(vcpu, false); + + reg_val = ((unsigned long *)csr)[reg_num]; + + if (copy_to_user(uaddr, ®_val, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + return 0; +} + +static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; + unsigned long __user *uaddr = + (unsigned long __user *)(unsigned long)reg->addr; + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | + KVM_REG_SIZE_MASK | + KVM_REG_RISCV_CSR); + unsigned long reg_val; + + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) + return -EINVAL; + if (reg_num >= sizeof(struct kvm_sregs) / sizeof(unsigned long)) + return -EINVAL; + + if (copy_from_user(®_val, uaddr, KVM_REG_SIZE(reg->id))) + return -EFAULT; + + ((unsigned long *)csr)[reg_num] = reg_val; + + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) + WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); + + return 0; +} + +static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) + return kvm_riscv_vcpu_set_reg_config(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) + return kvm_riscv_vcpu_set_reg_core(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) + return kvm_riscv_vcpu_set_reg_csr(vcpu, reg); + + return -EINVAL; +} + +static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, + const struct kvm_one_reg *reg) +{ + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) + return kvm_riscv_vcpu_get_reg_config(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) + return kvm_riscv_vcpu_get_reg_core(vcpu, reg); + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) + return kvm_riscv_vcpu_get_reg_csr(vcpu, reg); + + return -EINVAL; +} + long kvm_arch_vcpu_async_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -185,8 +394,30 @@ long kvm_arch_vcpu_async_ioctl(struct file *filp, long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - /* TODO: */ - return -EINVAL; + struct kvm_vcpu *vcpu = filp->private_data; + void __user *argp = (void __user *)arg; + long r = -EINVAL; + + switch (ioctl) { + case KVM_SET_ONE_REG: + case KVM_GET_ONE_REG: { + struct kvm_one_reg reg; + + r = -EFAULT; + if (copy_from_user(®, argp, sizeof(reg))) + break; + + if (ioctl == KVM_SET_ONE_REG) + r = kvm_riscv_vcpu_set_reg(vcpu, ®); + else + r = kvm_riscv_vcpu_get_reg(vcpu, ®); + break; + } + default: + break; + } + + return r; } int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access VCPU config and registers from user-space. We have three types of VCPU registers: 1. CONFIG - these are VCPU config and capabilities 2. CORE - these are VCPU general purpose registers 3. CSR - these are VCPU control and status registers The CONFIG registers available to user-space are ISA and TIMEBASE. Out of these, TIMEBASE is a read-only register which inform user-space about VCPU timer base frequency. The ISA register is a read and write register where user-space can only write the desired VCPU ISA capabilities before running the VCPU. The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7, T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except PC and MODE. The PC register represents program counter whereas the MODE register represent VCPU privilege mode (i.e. S/U-mode). The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC, SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers. In future, more VCPU register types will be added (such as FP) for the KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls. Signed-off-by: Anup Patel <anup.patel@wdc.com> --- arch/riscv/include/uapi/asm/kvm.h | 40 ++++- arch/riscv/kvm/vcpu.c | 235 +++++++++++++++++++++++++++++- 2 files changed, 272 insertions(+), 3 deletions(-)