mbox series

[v5,00/20] KVM RISC-V Support

Message ID 20190822084131.114764-1-anup.patel@wdc.com (mailing list archive)
Headers show
Series KVM RISC-V Support | expand

Message

Anup Patel Aug. 22, 2019, 8:42 a.m. UTC
This series adds initial KVM RISC-V support. Currently, we are able to boot
RISC-V 64bit Linux Guests with multiple VCPUs.

Few key aspects of KVM RISC-V added by this series are:
1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
3. KVM ONE_REG interface for VCPU register access from user-space.
4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
   be added in future.
5. Timer and IPI emuation is done in-kernel.
6. MMU notifiers supported.
7. FP lazy save/restore supported.
8. SBI v0.1 emulation for KVM Guest available.

Here's a brief TODO list which we will work upon after this series:
1. Handle trap from unpriv access in reading Guest instruction
2. Handle trap from unpriv access in SBI v0.1 emulation
3. Implement recursive stage2 page table programing
4. SBI v0.2 emulation in-kernel
5. SBI v0.2 hart hotplug emulation in-kernel
6. In-kernel PLIC emulation
7. ..... and more .....

This series can be found in riscv_kvm_v5 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v1 branch at:
https//github.com/avpatel/kvmtool.git

We need OpenSBI with RISC-V hypervisor extension support which can be
found in hyp_ext_changes_v1 branch at:
https://github.com/riscv/opensbi.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in riscv-hyp-work.next branch at:
https://github.com/alistair23/qemu.git

To play around with KVM RISC-V, here are few reference commands:
1) To cross-compile KVMTOOL:
   $ make lkvm-static
2) To launch RISC-V Host Linux:
   $ qemu-system-riscv64 -monitor null -cpu rv64,h=true -M virt \
   -m 512M -display none -serial mon:stdio \
   -kernel opensbi/build/platform/qemu/virt/firmware/fw_jump.elf \
   -device loader,file=build-riscv64/arch/riscv/boot/Image,addr=0x80200000 \
   -initrd ./rootfs_kvm_riscv64.img \
   -append "root=/dev/ram rw console=ttyS0 earlycon=sbi"
3) To launch RISC-V Guest Linux with 9P rootfs:
   $ ./apps/lkvm-static run -m 128 -c2 --console serial \
   -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image --debug
4) To launch RISC-V Guest Linux with initrd:
   $ ./apps/lkvm-static run -m 128 -c2 --console serial \
   -p "console=ttyS0 earlycon=uart8250,mmio,0x3f8" -k ./apps/Image \
   -i ./apps/rootfs.img --debug

Changes since v4:
- Rebased patches on Linux-5.3-rc5
- Added Paolo's Acked-by and Reviewed-by
- Updated mailing list in MAINTAINERS entry

Changes since v3:
- Moved patch for ISA bitmap from KVM prep series to this series
- Make vsip_shadow as run-time percpu variable instead of compile-time
- Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
- Removed references of KVM_REQ_IRQ_PENDING from all patches
- Use kvm->srcu within in-kernel KVM run loop
- Added percpu vsip_shadow to track last value programmed in VSIP CSR
- Added comments about irqs_pending and irqs_pending_mask
- Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
  in system_opcode_insn()
- Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
- Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
- Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
- Fixed compile errors in building KVM RISC-V as module
- Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
- Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
- Made vmid_version as unsigned long instead of atomic
- Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
- Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
- Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
- Updated ONE_REG interface for CSR access to user-space
- Removed irqs_pending_lock and use atomic bitops instead
- Added separate patch for FP ONE_REG interface
- Added separate patch for updating MAINTAINERS file

Anup Patel (15):
  KVM: RISC-V: Add KVM_REG_RISCV for ONE_REG interface
  RISC-V: Add bitmap reprensenting ISA features common across CPUs
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM support
  RISC-V: KVM: Implement VCPU create, init and destroy functions
  RISC-V: KVM: Implement VCPU interrupts and requests handling
  RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  RISC-V: KVM: Implement VCPU world-switch
  RISC-V: KVM: Handle MMIO exits for VCPU
  RISC-V: KVM: Handle WFI exits for VCPU
  RISC-V: KVM: Implement VMID allocator
  RISC-V: KVM: Implement stage2 page table programming
  RISC-V: KVM: Implement MMU notifiers
  RISC-V: Enable VIRTIO drivers in RV64 and RV32 defconfig
  RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (5):
  RISC-V: Export few kernel symbols
  RISC-V: KVM: Add timer functionality
  RISC-V: KVM: FP lazy save/restore
  RISC-V: KVM: Implement ONE REG interface for FP registers
  RISC-V: KVM: Add SBI v0.1 support

 MAINTAINERS                             |  10 +
 arch/riscv/Kconfig                      |   2 +
 arch/riscv/Makefile                     |   2 +
 arch/riscv/configs/defconfig            |  11 +
 arch/riscv/configs/rv32_defconfig       |  11 +
 arch/riscv/include/asm/csr.h            |  58 ++
 arch/riscv/include/asm/hwcap.h          |  26 +
 arch/riscv/include/asm/kvm_host.h       | 246 ++++++
 arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +
 arch/riscv/include/asm/pgtable-bits.h   |   1 +
 arch/riscv/include/uapi/asm/kvm.h       |  98 +++
 arch/riscv/kernel/asm-offsets.c         | 148 ++++
 arch/riscv/kernel/cpufeature.c          |  79 +-
 arch/riscv/kernel/smp.c                 |   2 +-
 arch/riscv/kernel/time.c                |   1 +
 arch/riscv/kvm/Kconfig                  |  34 +
 arch/riscv/kvm/Makefile                 |  14 +
 arch/riscv/kvm/main.c                   |  92 +++
 arch/riscv/kvm/mmu.c                    | 905 ++++++++++++++++++++++
 arch/riscv/kvm/tlb.S                    |  43 ++
 arch/riscv/kvm/vcpu.c                   | 989 ++++++++++++++++++++++++
 arch/riscv/kvm/vcpu_exit.c              | 556 +++++++++++++
 arch/riscv/kvm/vcpu_sbi.c               | 119 +++
 arch/riscv/kvm/vcpu_switch.S            | 368 +++++++++
 arch/riscv/kvm/vcpu_timer.c             | 106 +++
 arch/riscv/kvm/vm.c                     |  86 +++
 arch/riscv/kvm/vmid.c                   | 123 +++
 drivers/clocksource/timer-riscv.c       |   8 +
 include/clocksource/timer-riscv.h       |  16 +
 include/uapi/linux/kvm.h                |   1 +
 30 files changed, 4183 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c
 create mode 100644 arch/riscv/kvm/vcpu_switch.S
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 arch/riscv/kvm/vm.c
 create mode 100644 arch/riscv/kvm/vmid.c
 create mode 100644 include/clocksource/timer-riscv.h

--
2.17.1

Comments

Alexander Graf Aug. 23, 2019, 8:08 a.m. UTC | #1
On 22.08.19 10:42, Anup Patel wrote:
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> RISC-V 64bit Linux Guests with multiple VCPUs.
> 
> Few key aspects of KVM RISC-V added by this series are:
> 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> 3. KVM ONE_REG interface for VCPU register access from user-space.
> 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
>     be added in future.
> 5. Timer and IPI emuation is done in-kernel.
> 6. MMU notifiers supported.
> 7. FP lazy save/restore supported.
> 8. SBI v0.1 emulation for KVM Guest available.
> 
> Here's a brief TODO list which we will work upon after this series:
> 1. Handle trap from unpriv access in reading Guest instruction
> 2. Handle trap from unpriv access in SBI v0.1 emulation
> 3. Implement recursive stage2 page table programing
> 4. SBI v0.2 emulation in-kernel
> 5. SBI v0.2 hart hotplug emulation in-kernel
> 6. In-kernel PLIC emulation
> 7. ..... and more .....

Please consider patches I did not comment on as

Reviewed-by: Alexander Graf <graf@amazon.com>

Overall, I'm quite happy with the code. It's a very clean implementation 
of a KVM target.

The only major nit I have is the guest address space read: I don't think 
we should pull in code that we know allows user space to DOS the kernel. 
For that, we need to find an alternative. Either you implement a 
software page table walker and resolve VAs manually or you find a way to 
ensure that *any* exception taken during the read does not affect 
general code execution.


Thanks,

Alex
Anup Patel Aug. 23, 2019, 11:25 a.m. UTC | #2
On Fri, Aug 23, 2019 at 1:39 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 22.08.19 10:42, Anup Patel wrote:
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > RISC-V 64bit Linux Guests with multiple VCPUs.
> >
> > Few key aspects of KVM RISC-V added by this series are:
> > 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
> > 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
> > 3. KVM ONE_REG interface for VCPU register access from user-space.
> > 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
> >     be added in future.
> > 5. Timer and IPI emuation is done in-kernel.
> > 6. MMU notifiers supported.
> > 7. FP lazy save/restore supported.
> > 8. SBI v0.1 emulation for KVM Guest available.
> >
> > Here's a brief TODO list which we will work upon after this series:
> > 1. Handle trap from unpriv access in reading Guest instruction
> > 2. Handle trap from unpriv access in SBI v0.1 emulation
> > 3. Implement recursive stage2 page table programing
> > 4. SBI v0.2 emulation in-kernel
> > 5. SBI v0.2 hart hotplug emulation in-kernel
> > 6. In-kernel PLIC emulation
> > 7. ..... and more .....
>
> Please consider patches I did not comment on as
>
> Reviewed-by: Alexander Graf <graf@amazon.com>
>
> Overall, I'm quite happy with the code. It's a very clean implementation
> of a KVM target.

Thanks Alex.

>
> The only major nit I have is the guest address space read: I don't think
> we should pull in code that we know allows user space to DOS the kernel.
> For that, we need to find an alternative. Either you implement a
> software page table walker and resolve VAs manually or you find a way to
> ensure that *any* exception taken during the read does not affect
> general code execution.

I will send v6 next week. I will try my best to implement unpriv trap
handling in v6 itself.

Regards,
Anup

>
>
> Thanks,
>
> Alex
Alexander Graf Aug. 23, 2019, 11:44 a.m. UTC | #3
> Am 23.08.2019 um 13:26 schrieb Anup Patel <anup@brainfault.org>:
> 
>> On Fri, Aug 23, 2019 at 1:39 PM Alexander Graf <graf@amazon.com> wrote:
>> 
>>> On 22.08.19 10:42, Anup Patel wrote:
>>> This series adds initial KVM RISC-V support. Currently, we are able to boot
>>> RISC-V 64bit Linux Guests with multiple VCPUs.
>>> 
>>> Few key aspects of KVM RISC-V added by this series are:
>>> 1. Minimal possible KVM world-switch which touches only GPRs and few CSRs.
>>> 2. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure.
>>> 3. KVM ONE_REG interface for VCPU register access from user-space.
>>> 4. PLIC emulation is done in user-space. In-kernel PLIC emulation, will
>>>    be added in future.
>>> 5. Timer and IPI emuation is done in-kernel.
>>> 6. MMU notifiers supported.
>>> 7. FP lazy save/restore supported.
>>> 8. SBI v0.1 emulation for KVM Guest available.
>>> 
>>> Here's a brief TODO list which we will work upon after this series:
>>> 1. Handle trap from unpriv access in reading Guest instruction
>>> 2. Handle trap from unpriv access in SBI v0.1 emulation
>>> 3. Implement recursive stage2 page table programing
>>> 4. SBI v0.2 emulation in-kernel
>>> 5. SBI v0.2 hart hotplug emulation in-kernel
>>> 6. In-kernel PLIC emulation
>>> 7. ..... and more .....
>> 
>> Please consider patches I did not comment on as
>> 
>> Reviewed-by: Alexander Graf <graf@amazon.com>
>> 
>> Overall, I'm quite happy with the code. It's a very clean implementation
>> of a KVM target.
> 
> Thanks Alex.
> 
>> 
>> The only major nit I have is the guest address space read: I don't think
>> we should pull in code that we know allows user space to DOS the kernel.
>> For that, we need to find an alternative. Either you implement a
>> software page table walker and resolve VAs manually or you find a way to
>> ensure that *any* exception taken during the read does not affect
>> general code execution.
> 
> I will send v6 next week. I will try my best to implement unpriv trap
> handling in v6 itself.

Are you sure unpriv is the only exception that can hit there? What about NMIs? Do you have #MCs yet (ECC errors)? Do you have something like ARM's #SError which can asynchronously hit at any time because of external bus (PCI) errors?

Alex

> 
> Regards,
> Anup
> 
>> 
>> 
>> Thanks,
>> 
>> Alex
Paolo Bonzini Aug. 23, 2019, 12:10 p.m. UTC | #4
On 23/08/19 13:44, Graf (AWS), Alexander wrote:
>> Overall, I'm quite happy with the code. It's a very clean implementation
>> of a KVM target.

Yup, I said the same even for v1 (I prefer recursive implementation of
page table walking but that's all I can say).

>> I will send v6 next week. I will try my best to implement unpriv
>> trap handling in v6 itself.
> Are you sure unpriv is the only exception that can hit there? What
> about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
> like ARM's #SError which can asynchronously hit at any time because
> of external bus (PCI) errors?

As far as I know, all interrupts on RISC-V are disabled by
local_irq_disable()/local_irq_enable().

Paolo
Anup Patel Aug. 23, 2019, 12:19 p.m. UTC | #5
On Fri, Aug 23, 2019 at 5:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 23/08/19 13:44, Graf (AWS), Alexander wrote:
> >> Overall, I'm quite happy with the code. It's a very clean implementation
> >> of a KVM target.
>
> Yup, I said the same even for v1 (I prefer recursive implementation of
> page table walking but that's all I can say).
>
> >> I will send v6 next week. I will try my best to implement unpriv
> >> trap handling in v6 itself.
> > Are you sure unpriv is the only exception that can hit there? What
> > about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
> > like ARM's #SError which can asynchronously hit at any time because
> > of external bus (PCI) errors?
>
> As far as I know, all interrupts on RISC-V are disabled by
> local_irq_disable()/local_irq_enable().

Yes, we don't have per-CPU interrupts for async bus errors or
non-maskable interrupts. The local_irq_disable() and local_irq_enable()
affect all interrupts (excepts traps).

Although, the async bus errors can certainly be routed to Linux
via PLIC (interrupt-controller) as regular peripheral interrupts.

Regards,
Anup

>
> Paolo
Alexander Graf Aug. 23, 2019, 12:28 p.m. UTC | #6
On 23.08.19 14:19, Anup Patel wrote:
> On Fri, Aug 23, 2019 at 5:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 23/08/19 13:44, Graf (AWS), Alexander wrote:
>>>> Overall, I'm quite happy with the code. It's a very clean implementation
>>>> of a KVM target.
>>
>> Yup, I said the same even for v1 (I prefer recursive implementation of
>> page table walking but that's all I can say).
>>
>>>> I will send v6 next week. I will try my best to implement unpriv
>>>> trap handling in v6 itself.
>>> Are you sure unpriv is the only exception that can hit there? What
>>> about NMIs? Do you have #MCs yet (ECC errors)? Do you have something
>>> like ARM's #SError which can asynchronously hit at any time because
>>> of external bus (PCI) errors?
>>
>> As far as I know, all interrupts on RISC-V are disabled by
>> local_irq_disable()/local_irq_enable().
> 
> Yes, we don't have per-CPU interrupts for async bus errors or
> non-maskable interrupts. The local_irq_disable() and local_irq_enable()
> affect all interrupts (excepts traps).

Awesome, so that means you really only need to worry about traps. Even 
easier then! :)

Also, you want to look out for a future extension that adds any of the 
above (NMI, MCE, SError on local bus), as that would then break the 
function ;)


Alex