mbox series

[v16,00/17] KVM RISC-V Support

Message ID 20210115121846.114528-1-anup.patel@wdc.com (mailing list archive)
Headers show
Series KVM RISC-V Support | expand

Message

Anup Patel Jan. 15, 2021, 12:18 p.m. UTC
This series adds initial KVM RISC-V support. Currently, we are able to boot
Linux on RV64/RV32 Guest with multiple VCPUs.

Key aspects of KVM RISC-V added by this series are:
1. No RISC-V specific KVM IOCTL
2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
3. Both RV64 and RV32 host supported
4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
5. KVM ONE_REG interface for VCPU register access from user-space
6. PLIC emulation is done in user-space
7. Timer and IPI emuation is done in-kernel
8. Both Sv39x4 and Sv48x4 supported for RV64 host
9. MMU notifiers supported
10. Generic dirtylog supported
11. FP lazy save/restore supported
12. SBI v0.1 emulation for KVM Guest available
13. Forward unhandled SBI calls to KVM userspace
14. Hugepage support for Guest/VM
15. IOEVENTFD support for Vhost

Here's a brief TODO list which we will work upon after this series:
1. SBI v0.2 emulation in-kernel
2. SBI v0.2 hart state management emulation in-kernel
3. In-kernel PLIC emulation
4. ..... and more .....

This series can be found in riscv_kvm_v16 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v6 branch
at: https//github.com/avpatel/kvmtool.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in master branch at: https://git.qemu.org/git/qemu.git

To play around with KVM RISC-V, refer KVM RISC-V wiki at:
https://github.com/kvm-riscv/howto/wiki
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike

Changes since v15:
 - Rebased on Linux-5.11-rc3
 - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
   writeability of a host pfn.
 - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
   uapi/asm/kvm.h

Changes since v14:
 - Rebased on Linux-5.10-rc3
 - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned

Changes since v13:
 - Rebased on Linux-5.9-rc3
 - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
 - Fixed instruction length computation in PATCH7
 - Added ioeventfd support in PATCH7
 - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
   intructions in PATCH7
 - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
   in PATCH10
 - Added stage2 dirty page logging in PATCH10
 - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
 - Save/restore SCOUNTEREN in PATCH6
 - Reduced quite a few instructions for __kvm_riscv_switch_to() by
   using CSR swap instruction in PATCH6
 - Detect and use Sv48x4 when available in PATCH10

Changes since v12:
 - Rebased patches on Linux-5.8-rc4
 - By default enable all counters in HCOUNTEREN
 - RISC-V H-Extension v0.6.1 spec support

Changes since v11:
 - Rebased patches on Linux-5.7-rc3
 - Fixed typo in typecast of stage2_map_size define
 - Introduced struct kvm_cpu_trap to represent trap details and
   use it as function parameter wherever applicable
 - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
   logging in future
 - RISC-V H-Extension v0.6 spec support
 - Send-out first three patches as separate series so that it can
   be taken by Palmer for Linux RISC-V

Changes since v10:
 - Rebased patches on Linux-5.6-rc5
 - Reduce RISCV_ISA_EXT_MAX from 256 to 64
 - Separate PATCH for removing N-extension related defines
 - Added comments as requested by Palmer
 - Fixed HIDELEG CSR programming

Changes since v9:
 - Rebased patches on Linux-5.5-rc3
 - Squash PATCH19 and PATCH20 into PATCH5
 - Squash PATCH18 into PATCH11
 - Squash PATCH17 into PATCH16
 - Added ONE_REG interface for VCPU timer in PATCH13
 - Use HTIMEDELTA for VCPU timer in PATCH13
 - Updated KVM RISC-V mailing list in MAINTAINERS entry
 - Update KVM kconfig option to depend on RISCV_SBI and MMU
 - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
 - Use SBI v0.2 RFENCE extension in VMID implementation
 - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
 - Use SBI v0.2 RFENCE extension in SBI implementation
 - Moved to RISC-V Hypervisor v0.5 draft spec
 - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface

Changes since v8:
 - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
 - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
 - Fixed kvm_riscv_stage2_map() to handle hugepages
 - Added patch to forward unhandled SBI calls to user-space
 - Added patch for iterative/recursive stage2 page table programming
 - Added patch to remove per-CPU vsip_shadow variable
 - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()

Changes since v7:
 - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
 - Removed PATCH1, PATCH3, and PATCH20 because these already merged
 - Use kernel doc style comments for ISA bitmap functions
 - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
   be added in-future
 - Mark KVM RISC-V kconfig option as EXPERIMENTAL
 - Typo fix in commit description of PATCH6 of v7 series
 - Use separate structs for CORE and CSR registers of ONE_REG interface
 - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
 - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
 - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
 - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
 - Added defines for checking/decoding instruction length
 - Added separate patch to forward unhandled SBI calls to userspace tool

Changes since v6:
 - Rebased patches on Linux-5.3-rc7
 - Added "return_handled" in struct kvm_mmio_decode to ensure that
   kvm_riscv_vcpu_mmio_return() updates SEPC only once
 - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
 - Updated git repo URL in MAINTAINERS entry

Changes since v5:
 - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
   KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
 - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
 - Use switch case instead of illegal instruction opcode table for simplicity
 - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
  flush optimization
 - Handle all unsupported SBI calls in default case of
   kvm_riscv_vcpu_sbi_ecall() function
 - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
 - Improved unprivilege reads to handle traps due to Guest stage1 page table
 - Added separate patch to document RISC-V specific things in
   Documentation/virt/kvm/api.txt

Changes since v4:
 - Rebased patches on Linux-5.3-rc5
 - Added Paolo's Acked-by and Reviewed-by
 - Updated mailing list in MAINTAINERS entry

Changes since v3:
 - Moved patch for ISA bitmap from KVM prep series to this series
 - Make vsip_shadow as run-time percpu variable instead of compile-time
 - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
 - Removed references of KVM_REQ_IRQ_PENDING from all patches
 - Use kvm->srcu within in-kernel KVM run loop
 - Added percpu vsip_shadow to track last value programmed in VSIP CSR
 - Added comments about irqs_pending and irqs_pending_mask
 - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
   in system_opcode_insn()
 - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
 - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
 - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
 - Fixed compile errors in building KVM RISC-V as module
 - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
 - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
 - Made vmid_version as unsigned long instead of atomic
 - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
 - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
 - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
 - Updated ONE_REG interface for CSR access to user-space
 - Removed irqs_pending_lock and use atomic bitops instead
 - Added separate patch for FP ONE_REG interface
 - Added separate patch for updating MAINTAINERS file

Anup Patel (13):
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM support
  RISC-V: KVM: Implement VCPU create, init and destroy functions
  RISC-V: KVM: Implement VCPU interrupts and requests handling
  RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  RISC-V: KVM: Implement VCPU world-switch
  RISC-V: KVM: Handle MMIO exits for VCPU
  RISC-V: KVM: Handle WFI exits for VCPU
  RISC-V: KVM: Implement VMID allocator
  RISC-V: KVM: Implement stage2 page table programming
  RISC-V: KVM: Implement MMU notifiers
  RISC-V: KVM: Document RISC-V specific parts of KVM API
  RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (4):
  RISC-V: KVM: Add timer functionality
  RISC-V: KVM: FP lazy save/restore
  RISC-V: KVM: Implement ONE REG interface for FP registers
  RISC-V: KVM: Add SBI v0.1 support

 Documentation/virt/kvm/api.rst          |  193 ++++-
 MAINTAINERS                             |   11 +
 arch/riscv/Kconfig                      |    1 +
 arch/riscv/Makefile                     |    2 +
 arch/riscv/include/asm/csr.h            |   89 ++
 arch/riscv/include/asm/kvm_host.h       |  278 +++++++
 arch/riscv/include/asm/kvm_types.h      |    7 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |   44 +
 arch/riscv/include/asm/pgtable-bits.h   |    1 +
 arch/riscv/include/uapi/asm/kvm.h       |  128 +++
 arch/riscv/kernel/asm-offsets.c         |  156 ++++
 arch/riscv/kvm/Kconfig                  |   36 +
 arch/riscv/kvm/Makefile                 |   15 +
 arch/riscv/kvm/main.c                   |  118 +++
 arch/riscv/kvm/mmu.c                    |  860 +++++++++++++++++++
 arch/riscv/kvm/tlb.S                    |   74 ++
 arch/riscv/kvm/vcpu.c                   | 1012 +++++++++++++++++++++++
 arch/riscv/kvm/vcpu_exit.c              |  701 ++++++++++++++++
 arch/riscv/kvm/vcpu_sbi.c               |  173 ++++
 arch/riscv/kvm/vcpu_switch.S            |  400 +++++++++
 arch/riscv/kvm/vcpu_timer.c             |  225 +++++
 arch/riscv/kvm/vm.c                     |   81 ++
 arch/riscv/kvm/vmid.c                   |  120 +++
 drivers/clocksource/timer-riscv.c       |    8 +
 include/clocksource/timer-riscv.h       |   16 +
 include/uapi/linux/kvm.h                |    8 +
 26 files changed, 4748 insertions(+), 9 deletions(-)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/asm/kvm_types.h
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c
 create mode 100644 arch/riscv/kvm/vcpu_switch.S
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 arch/riscv/kvm/vm.c
 create mode 100644 arch/riscv/kvm/vmid.c
 create mode 100644 include/clocksource/timer-riscv.h

Comments

Palmer Dabbelt Jan. 23, 2021, 3:40 a.m. UTC | #1
On Fri, 15 Jan 2021 04:18:29 PST (-0800), Anup Patel wrote:
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> Linux on RV64/RV32 Guest with multiple VCPUs.

Thanks.  IIUC the spec is still in limbo at the RISC-V foundation?  I haven't
really been paying attention lately.

>
> Key aspects of KVM RISC-V added by this series are:
> 1. No RISC-V specific KVM IOCTL
> 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> 3. Both RV64 and RV32 host supported
> 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> 5. KVM ONE_REG interface for VCPU register access from user-space
> 6. PLIC emulation is done in user-space
> 7. Timer and IPI emuation is done in-kernel
> 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> 9. MMU notifiers supported
> 10. Generic dirtylog supported
> 11. FP lazy save/restore supported
> 12. SBI v0.1 emulation for KVM Guest available
> 13. Forward unhandled SBI calls to KVM userspace
> 14. Hugepage support for Guest/VM
> 15. IOEVENTFD support for Vhost
>
> Here's a brief TODO list which we will work upon after this series:
> 1. SBI v0.2 emulation in-kernel
> 2. SBI v0.2 hart state management emulation in-kernel
> 3. In-kernel PLIC emulation
> 4. ..... and more .....
>
> This series can be found in riscv_kvm_v16 branch at:
> https//github.com/avpatel/linux.git
>
> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v6 branch
> at: https//github.com/avpatel/kvmtool.git
>
> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> in master branch at: https://git.qemu.org/git/qemu.git
>
> To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> https://github.com/kvm-riscv/howto/wiki
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
>
> Changes since v15:
>  - Rebased on Linux-5.11-rc3
>  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
>    writeability of a host pfn.
>  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
>    uapi/asm/kvm.h
>
> Changes since v14:
>  - Rebased on Linux-5.10-rc3
>  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
>
> Changes since v13:
>  - Rebased on Linux-5.9-rc3
>  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
>  - Fixed instruction length computation in PATCH7
>  - Added ioeventfd support in PATCH7
>  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
>    intructions in PATCH7
>  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
>    in PATCH10
>  - Added stage2 dirty page logging in PATCH10
>  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
>  - Save/restore SCOUNTEREN in PATCH6
>  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
>    using CSR swap instruction in PATCH6
>  - Detect and use Sv48x4 when available in PATCH10
>
> Changes since v12:
>  - Rebased patches on Linux-5.8-rc4
>  - By default enable all counters in HCOUNTEREN
>  - RISC-V H-Extension v0.6.1 spec support
>
> Changes since v11:
>  - Rebased patches on Linux-5.7-rc3
>  - Fixed typo in typecast of stage2_map_size define
>  - Introduced struct kvm_cpu_trap to represent trap details and
>    use it as function parameter wherever applicable
>  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
>    logging in future
>  - RISC-V H-Extension v0.6 spec support
>  - Send-out first three patches as separate series so that it can
>    be taken by Palmer for Linux RISC-V
>
> Changes since v10:
>  - Rebased patches on Linux-5.6-rc5
>  - Reduce RISCV_ISA_EXT_MAX from 256 to 64
>  - Separate PATCH for removing N-extension related defines
>  - Added comments as requested by Palmer
>  - Fixed HIDELEG CSR programming
>
> Changes since v9:
>  - Rebased patches on Linux-5.5-rc3
>  - Squash PATCH19 and PATCH20 into PATCH5
>  - Squash PATCH18 into PATCH11
>  - Squash PATCH17 into PATCH16
>  - Added ONE_REG interface for VCPU timer in PATCH13
>  - Use HTIMEDELTA for VCPU timer in PATCH13
>  - Updated KVM RISC-V mailing list in MAINTAINERS entry
>  - Update KVM kconfig option to depend on RISCV_SBI and MMU
>  - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
>  - Use SBI v0.2 RFENCE extension in VMID implementation
>  - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
>  - Use SBI v0.2 RFENCE extension in SBI implementation
>  - Moved to RISC-V Hypervisor v0.5 draft spec
>  - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface
>
> Changes since v8:
>  - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
>  - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
>  - Fixed kvm_riscv_stage2_map() to handle hugepages
>  - Added patch to forward unhandled SBI calls to user-space
>  - Added patch for iterative/recursive stage2 page table programming
>  - Added patch to remove per-CPU vsip_shadow variable
>  - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()
>
> Changes since v7:
>  - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
>  - Removed PATCH1, PATCH3, and PATCH20 because these already merged
>  - Use kernel doc style comments for ISA bitmap functions
>  - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
>    be added in-future
>  - Mark KVM RISC-V kconfig option as EXPERIMENTAL
>  - Typo fix in commit description of PATCH6 of v7 series
>  - Use separate structs for CORE and CSR registers of ONE_REG interface
>  - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
>  - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
>  - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
>  - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
>  - Added defines for checking/decoding instruction length
>  - Added separate patch to forward unhandled SBI calls to userspace tool
>
> Changes since v6:
>  - Rebased patches on Linux-5.3-rc7
>  - Added "return_handled" in struct kvm_mmio_decode to ensure that
>    kvm_riscv_vcpu_mmio_return() updates SEPC only once
>  - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
>  - Updated git repo URL in MAINTAINERS entry
>
> Changes since v5:
>  - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
>    KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
>  - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
>  - Use switch case instead of illegal instruction opcode table for simplicity
>  - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
>   flush optimization
>  - Handle all unsupported SBI calls in default case of
>    kvm_riscv_vcpu_sbi_ecall() function
>  - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
>  - Improved unprivilege reads to handle traps due to Guest stage1 page table
>  - Added separate patch to document RISC-V specific things in
>    Documentation/virt/kvm/api.txt
>
> Changes since v4:
>  - Rebased patches on Linux-5.3-rc5
>  - Added Paolo's Acked-by and Reviewed-by
>  - Updated mailing list in MAINTAINERS entry
>
> Changes since v3:
>  - Moved patch for ISA bitmap from KVM prep series to this series
>  - Make vsip_shadow as run-time percpu variable instead of compile-time
>  - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs
>
> Changes since v2:
>  - Removed references of KVM_REQ_IRQ_PENDING from all patches
>  - Use kvm->srcu within in-kernel KVM run loop
>  - Added percpu vsip_shadow to track last value programmed in VSIP CSR
>  - Added comments about irqs_pending and irqs_pending_mask
>  - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
>    in system_opcode_insn()
>  - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
>  - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
>  - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid
>
> Changes since v1:
>  - Fixed compile errors in building KVM RISC-V as module
>  - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
>  - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
>  - Made vmid_version as unsigned long instead of atomic
>  - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
>  - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
>  - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
>  - Updated ONE_REG interface for CSR access to user-space
>  - Removed irqs_pending_lock and use atomic bitops instead
>  - Added separate patch for FP ONE_REG interface
>  - Added separate patch for updating MAINTAINERS file
>
> Anup Patel (13):
>   RISC-V: Add hypervisor extension related CSR defines
>   RISC-V: Add initial skeletal KVM support
>   RISC-V: KVM: Implement VCPU create, init and destroy functions
>   RISC-V: KVM: Implement VCPU interrupts and requests handling
>   RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
>   RISC-V: KVM: Implement VCPU world-switch
>   RISC-V: KVM: Handle MMIO exits for VCPU
>   RISC-V: KVM: Handle WFI exits for VCPU
>   RISC-V: KVM: Implement VMID allocator
>   RISC-V: KVM: Implement stage2 page table programming
>   RISC-V: KVM: Implement MMU notifiers
>   RISC-V: KVM: Document RISC-V specific parts of KVM API
>   RISC-V: KVM: Add MAINTAINERS entry
>
> Atish Patra (4):
>   RISC-V: KVM: Add timer functionality
>   RISC-V: KVM: FP lazy save/restore
>   RISC-V: KVM: Implement ONE REG interface for FP registers
>   RISC-V: KVM: Add SBI v0.1 support
>
>  Documentation/virt/kvm/api.rst          |  193 ++++-
>  MAINTAINERS                             |   11 +
>  arch/riscv/Kconfig                      |    1 +
>  arch/riscv/Makefile                     |    2 +
>  arch/riscv/include/asm/csr.h            |   89 ++
>  arch/riscv/include/asm/kvm_host.h       |  278 +++++++
>  arch/riscv/include/asm/kvm_types.h      |    7 +
>  arch/riscv/include/asm/kvm_vcpu_timer.h |   44 +
>  arch/riscv/include/asm/pgtable-bits.h   |    1 +
>  arch/riscv/include/uapi/asm/kvm.h       |  128 +++
>  arch/riscv/kernel/asm-offsets.c         |  156 ++++
>  arch/riscv/kvm/Kconfig                  |   36 +
>  arch/riscv/kvm/Makefile                 |   15 +
>  arch/riscv/kvm/main.c                   |  118 +++
>  arch/riscv/kvm/mmu.c                    |  860 +++++++++++++++++++
>  arch/riscv/kvm/tlb.S                    |   74 ++
>  arch/riscv/kvm/vcpu.c                   | 1012 +++++++++++++++++++++++
>  arch/riscv/kvm/vcpu_exit.c              |  701 ++++++++++++++++
>  arch/riscv/kvm/vcpu_sbi.c               |  173 ++++
>  arch/riscv/kvm/vcpu_switch.S            |  400 +++++++++
>  arch/riscv/kvm/vcpu_timer.c             |  225 +++++
>  arch/riscv/kvm/vm.c                     |   81 ++
>  arch/riscv/kvm/vmid.c                   |  120 +++
>  drivers/clocksource/timer-riscv.c       |    8 +
>  include/clocksource/timer-riscv.h       |   16 +
>  include/uapi/linux/kvm.h                |    8 +
>  26 files changed, 4748 insertions(+), 9 deletions(-)
>  create mode 100644 arch/riscv/include/asm/kvm_host.h
>  create mode 100644 arch/riscv/include/asm/kvm_types.h
>  create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
>  create mode 100644 arch/riscv/include/uapi/asm/kvm.h
>  create mode 100644 arch/riscv/kvm/Kconfig
>  create mode 100644 arch/riscv/kvm/Makefile
>  create mode 100644 arch/riscv/kvm/main.c
>  create mode 100644 arch/riscv/kvm/mmu.c
>  create mode 100644 arch/riscv/kvm/tlb.S
>  create mode 100644 arch/riscv/kvm/vcpu.c
>  create mode 100644 arch/riscv/kvm/vcpu_exit.c
>  create mode 100644 arch/riscv/kvm/vcpu_sbi.c
>  create mode 100644 arch/riscv/kvm/vcpu_switch.S
>  create mode 100644 arch/riscv/kvm/vcpu_timer.c
>  create mode 100644 arch/riscv/kvm/vm.c
>  create mode 100644 arch/riscv/kvm/vmid.c
>  create mode 100644 include/clocksource/timer-riscv.h
Anup Patel March 30, 2021, 5:48 a.m. UTC | #2
On Sat, Jan 23, 2021 at 9:10 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
>
> On Fri, 15 Jan 2021 04:18:29 PST (-0800), Anup Patel wrote:
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > Linux on RV64/RV32 Guest with multiple VCPUs.
>
> Thanks.  IIUC the spec is still in limbo at the RISC-V foundation?  I haven't
> really been paying attention lately.

There is no change in H-extension spec for more than a year now.

The H-extension spec also has provision for external interrupt controller
with virtualization support (such as the RISC-V AIA specification).

It seems Andrew does not want to freeze H-extension until we have virtualization
aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
of us feel that these things can be done independently because RISC-V
H-extension already has provisions for external interrupt controller with
virtualization support.

The freeze criteria for H-extension is still not clear to me.
Refer, https://lists.riscv.org/g/tech-privileged/topic/risc_v_h_extension_freeze/80346318?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,80346318

Regards,
Anup

>
> >
> > Key aspects of KVM RISC-V added by this series are:
> > 1. No RISC-V specific KVM IOCTL
> > 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> > 3. Both RV64 and RV32 host supported
> > 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> > 5. KVM ONE_REG interface for VCPU register access from user-space
> > 6. PLIC emulation is done in user-space
> > 7. Timer and IPI emuation is done in-kernel
> > 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> > 9. MMU notifiers supported
> > 10. Generic dirtylog supported
> > 11. FP lazy save/restore supported
> > 12. SBI v0.1 emulation for KVM Guest available
> > 13. Forward unhandled SBI calls to KVM userspace
> > 14. Hugepage support for Guest/VM
> > 15. IOEVENTFD support for Vhost
> >
> > Here's a brief TODO list which we will work upon after this series:
> > 1. SBI v0.2 emulation in-kernel
> > 2. SBI v0.2 hart state management emulation in-kernel
> > 3. In-kernel PLIC emulation
> > 4. ..... and more .....
> >
> > This series can be found in riscv_kvm_v16 branch at:
> > https//github.com/avpatel/linux.git
> >
> > Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v6 branch
> > at: https//github.com/avpatel/kvmtool.git
> >
> > The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> > in master branch at: https://git.qemu.org/git/qemu.git
> >
> > To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> > https://github.com/kvm-riscv/howto/wiki
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> >
> > Changes since v15:
> >  - Rebased on Linux-5.11-rc3
> >  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
> >    writeability of a host pfn.
> >  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
> >    uapi/asm/kvm.h
> >
> > Changes since v14:
> >  - Rebased on Linux-5.10-rc3
> >  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
> >
> > Changes since v13:
> >  - Rebased on Linux-5.9-rc3
> >  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
> >  - Fixed instruction length computation in PATCH7
> >  - Added ioeventfd support in PATCH7
> >  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
> >    intructions in PATCH7
> >  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
> >    in PATCH10
> >  - Added stage2 dirty page logging in PATCH10
> >  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
> >  - Save/restore SCOUNTEREN in PATCH6
> >  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
> >    using CSR swap instruction in PATCH6
> >  - Detect and use Sv48x4 when available in PATCH10
> >
> > Changes since v12:
> >  - Rebased patches on Linux-5.8-rc4
> >  - By default enable all counters in HCOUNTEREN
> >  - RISC-V H-Extension v0.6.1 spec support
> >
> > Changes since v11:
> >  - Rebased patches on Linux-5.7-rc3
> >  - Fixed typo in typecast of stage2_map_size define
> >  - Introduced struct kvm_cpu_trap to represent trap details and
> >    use it as function parameter wherever applicable
> >  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
> >    logging in future
> >  - RISC-V H-Extension v0.6 spec support
> >  - Send-out first three patches as separate series so that it can
> >    be taken by Palmer for Linux RISC-V
> >
> > Changes since v10:
> >  - Rebased patches on Linux-5.6-rc5
> >  - Reduce RISCV_ISA_EXT_MAX from 256 to 64
> >  - Separate PATCH for removing N-extension related defines
> >  - Added comments as requested by Palmer
> >  - Fixed HIDELEG CSR programming
> >
> > Changes since v9:
> >  - Rebased patches on Linux-5.5-rc3
> >  - Squash PATCH19 and PATCH20 into PATCH5
> >  - Squash PATCH18 into PATCH11
> >  - Squash PATCH17 into PATCH16
> >  - Added ONE_REG interface for VCPU timer in PATCH13
> >  - Use HTIMEDELTA for VCPU timer in PATCH13
> >  - Updated KVM RISC-V mailing list in MAINTAINERS entry
> >  - Update KVM kconfig option to depend on RISCV_SBI and MMU
> >  - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
> >  - Use SBI v0.2 RFENCE extension in VMID implementation
> >  - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
> >  - Use SBI v0.2 RFENCE extension in SBI implementation
> >  - Moved to RISC-V Hypervisor v0.5 draft spec
> >  - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface
> >
> > Changes since v8:
> >  - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
> >  - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
> >  - Fixed kvm_riscv_stage2_map() to handle hugepages
> >  - Added patch to forward unhandled SBI calls to user-space
> >  - Added patch for iterative/recursive stage2 page table programming
> >  - Added patch to remove per-CPU vsip_shadow variable
> >  - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()
> >
> > Changes since v7:
> >  - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
> >  - Removed PATCH1, PATCH3, and PATCH20 because these already merged
> >  - Use kernel doc style comments for ISA bitmap functions
> >  - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
> >    be added in-future
> >  - Mark KVM RISC-V kconfig option as EXPERIMENTAL
> >  - Typo fix in commit description of PATCH6 of v7 series
> >  - Use separate structs for CORE and CSR registers of ONE_REG interface
> >  - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
> >  - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
> >  - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
> >  - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
> >  - Added defines for checking/decoding instruction length
> >  - Added separate patch to forward unhandled SBI calls to userspace tool
> >
> > Changes since v6:
> >  - Rebased patches on Linux-5.3-rc7
> >  - Added "return_handled" in struct kvm_mmio_decode to ensure that
> >    kvm_riscv_vcpu_mmio_return() updates SEPC only once
> >  - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
> >  - Updated git repo URL in MAINTAINERS entry
> >
> > Changes since v5:
> >  - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
> >    KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
> >  - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
> >  - Use switch case instead of illegal instruction opcode table for simplicity
> >  - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
> >   flush optimization
> >  - Handle all unsupported SBI calls in default case of
> >    kvm_riscv_vcpu_sbi_ecall() function
> >  - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
> >  - Improved unprivilege reads to handle traps due to Guest stage1 page table
> >  - Added separate patch to document RISC-V specific things in
> >    Documentation/virt/kvm/api.txt
> >
> > Changes since v4:
> >  - Rebased patches on Linux-5.3-rc5
> >  - Added Paolo's Acked-by and Reviewed-by
> >  - Updated mailing list in MAINTAINERS entry
> >
> > Changes since v3:
> >  - Moved patch for ISA bitmap from KVM prep series to this series
> >  - Make vsip_shadow as run-time percpu variable instead of compile-time
> >  - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs
> >
> > Changes since v2:
> >  - Removed references of KVM_REQ_IRQ_PENDING from all patches
> >  - Use kvm->srcu within in-kernel KVM run loop
> >  - Added percpu vsip_shadow to track last value programmed in VSIP CSR
> >  - Added comments about irqs_pending and irqs_pending_mask
> >  - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
> >    in system_opcode_insn()
> >  - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
> >  - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
> >  - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid
> >
> > Changes since v1:
> >  - Fixed compile errors in building KVM RISC-V as module
> >  - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
> >  - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
> >  - Made vmid_version as unsigned long instead of atomic
> >  - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
> >  - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
> >  - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
> >  - Updated ONE_REG interface for CSR access to user-space
> >  - Removed irqs_pending_lock and use atomic bitops instead
> >  - Added separate patch for FP ONE_REG interface
> >  - Added separate patch for updating MAINTAINERS file
> >
> > Anup Patel (13):
> >   RISC-V: Add hypervisor extension related CSR defines
> >   RISC-V: Add initial skeletal KVM support
> >   RISC-V: KVM: Implement VCPU create, init and destroy functions
> >   RISC-V: KVM: Implement VCPU interrupts and requests handling
> >   RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
> >   RISC-V: KVM: Implement VCPU world-switch
> >   RISC-V: KVM: Handle MMIO exits for VCPU
> >   RISC-V: KVM: Handle WFI exits for VCPU
> >   RISC-V: KVM: Implement VMID allocator
> >   RISC-V: KVM: Implement stage2 page table programming
> >   RISC-V: KVM: Implement MMU notifiers
> >   RISC-V: KVM: Document RISC-V specific parts of KVM API
> >   RISC-V: KVM: Add MAINTAINERS entry
> >
> > Atish Patra (4):
> >   RISC-V: KVM: Add timer functionality
> >   RISC-V: KVM: FP lazy save/restore
> >   RISC-V: KVM: Implement ONE REG interface for FP registers
> >   RISC-V: KVM: Add SBI v0.1 support
> >
> >  Documentation/virt/kvm/api.rst          |  193 ++++-
> >  MAINTAINERS                             |   11 +
> >  arch/riscv/Kconfig                      |    1 +
> >  arch/riscv/Makefile                     |    2 +
> >  arch/riscv/include/asm/csr.h            |   89 ++
> >  arch/riscv/include/asm/kvm_host.h       |  278 +++++++
> >  arch/riscv/include/asm/kvm_types.h      |    7 +
> >  arch/riscv/include/asm/kvm_vcpu_timer.h |   44 +
> >  arch/riscv/include/asm/pgtable-bits.h   |    1 +
> >  arch/riscv/include/uapi/asm/kvm.h       |  128 +++
> >  arch/riscv/kernel/asm-offsets.c         |  156 ++++
> >  arch/riscv/kvm/Kconfig                  |   36 +
> >  arch/riscv/kvm/Makefile                 |   15 +
> >  arch/riscv/kvm/main.c                   |  118 +++
> >  arch/riscv/kvm/mmu.c                    |  860 +++++++++++++++++++
> >  arch/riscv/kvm/tlb.S                    |   74 ++
> >  arch/riscv/kvm/vcpu.c                   | 1012 +++++++++++++++++++++++
> >  arch/riscv/kvm/vcpu_exit.c              |  701 ++++++++++++++++
> >  arch/riscv/kvm/vcpu_sbi.c               |  173 ++++
> >  arch/riscv/kvm/vcpu_switch.S            |  400 +++++++++
> >  arch/riscv/kvm/vcpu_timer.c             |  225 +++++
> >  arch/riscv/kvm/vm.c                     |   81 ++
> >  arch/riscv/kvm/vmid.c                   |  120 +++
> >  drivers/clocksource/timer-riscv.c       |    8 +
> >  include/clocksource/timer-riscv.h       |   16 +
> >  include/uapi/linux/kvm.h                |    8 +
> >  26 files changed, 4748 insertions(+), 9 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/kvm_host.h
> >  create mode 100644 arch/riscv/include/asm/kvm_types.h
> >  create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
> >  create mode 100644 arch/riscv/include/uapi/asm/kvm.h
> >  create mode 100644 arch/riscv/kvm/Kconfig
> >  create mode 100644 arch/riscv/kvm/Makefile
> >  create mode 100644 arch/riscv/kvm/main.c
> >  create mode 100644 arch/riscv/kvm/mmu.c
> >  create mode 100644 arch/riscv/kvm/tlb.S
> >  create mode 100644 arch/riscv/kvm/vcpu.c
> >  create mode 100644 arch/riscv/kvm/vcpu_exit.c
> >  create mode 100644 arch/riscv/kvm/vcpu_sbi.c
> >  create mode 100644 arch/riscv/kvm/vcpu_switch.S
> >  create mode 100644 arch/riscv/kvm/vcpu_timer.c
> >  create mode 100644 arch/riscv/kvm/vm.c
> >  create mode 100644 arch/riscv/kvm/vmid.c
> >  create mode 100644 include/clocksource/timer-riscv.h
Paolo Bonzini March 31, 2021, 9:21 a.m. UTC | #3
On 30/03/21 07:48, Anup Patel wrote:
> 
> It seems Andrew does not want to freeze H-extension until we have virtualization
> aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
> of us feel that these things can be done independently because RISC-V
> H-extension already has provisions for external interrupt controller with
> virtualization support.

Yes, frankly that's pretty ridiculous as it's perfectly possible to 
emulate the interrupt controller in software (and an IOMMU is not needed 
at all if you are okay with emulated or paravirtualized devices---which 
is almost always the case except for partitioning hypervisors).

Palmer, are you okay with merging RISC-V KVM?  Or should we place it in 
drivers/staging/riscv/kvm?

Either way, the best way to do it would be like this:

1) you apply patch 1 in a topic branch

2) you merge the topic branch in the risc-v tree

3) Anup merges the topic branch too and sends me a pull request.

Paolo
Anup Patel April 1, 2021, 1:24 p.m. UTC | #4
On Wed, Mar 31, 2021 at 2:52 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 30/03/21 07:48, Anup Patel wrote:
> >
> > It seems Andrew does not want to freeze H-extension until we have virtualization
> > aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
> > of us feel that these things can be done independently because RISC-V
> > H-extension already has provisions for external interrupt controller with
> > virtualization support.
>
> Yes, frankly that's pretty ridiculous as it's perfectly possible to
> emulate the interrupt controller in software (and an IOMMU is not needed
> at all if you are okay with emulated or paravirtualized devices---which
> is almost always the case except for partitioning hypervisors).
>
> Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> drivers/staging/riscv/kvm?
>
> Either way, the best way to do it would be like this:
>
> 1) you apply patch 1 in a topic branch
>
> 2) you merge the topic branch in the risc-v tree
>
> 3) Anup merges the topic branch too and sends me a pull request.

In any case, I will send v17 based on Linux-5.12-rc5 so that people
can at least try KVM RISC-V based on latest kernel.

Regards,
Anup
Palmer Dabbelt April 9, 2021, 6:58 p.m. UTC | #5
On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
> On 30/03/21 07:48, Anup Patel wrote:
>>
>> It seems Andrew does not want to freeze H-extension until we have virtualization
>> aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
>> of us feel that these things can be done independently because RISC-V
>> H-extension already has provisions for external interrupt controller with
>> virtualization support.

Sorry to hear that.  It's really gotten to a point where I'm just 
embarrassed with how the RISC-V foundation is being run -- not sure if 
these other ones bled into Linux land, but this is the third ISA 
extension that's blown up over the last few weeks.  We had a lot of 
discussion about this on the binutils/GCC side of things and I've 
managed to convince myself that coupling the software stack to the 
specification process isn't viable -- we made that decision under the 
assumption that specifications would actually progress through the 
process, but in practice that's just not happening.

My goal with the RISC-V stuff has always been getting us to a place 
where we have real shipping products running a software stack that is as 
close as possible to the upstream codebases.  I see that as the only way 
to get the software stack to a point where it can be sustainably 
maintained.  The "only frozen extensions" policy was meant to help this 
by steering vendors towards a common base we could support, but in 
practice it's just not working out.  The specification process is just 
so unreliable that in practice everything that gets built ends up 
relying on some non-standard behavior: whether it's a draft extension, 
some vendor-specific extension, or just some implementation quirks.  
There's always going to be some degree of that going on, but over the 
last year or so we've just stopped progressing.

My worry with accepting the draft extensions is that we have no 
guarantee of compatibility between various drafts, which makes 
supporting multiple versions much more difficult.  I've always really 
only been worried about supporting what gets implemented in a chip I can 
actually run code on, as I can at least guarantee that doesn't change.  
In practice that really has nothing to do with the specification freeze: 
even ratified specifications change in ways that break compatibility so 
we need to support multiple versions anyway.  That's why we've got 
things like the K210 support (which doesn't quite follow the ratified 
specs) and are going to take the errata stuff.  I hadn't been all that 
worried about the H support because there was a plan to get is to 
hardware, but with the change I'm not really sure how that's going to 
happen.

> Yes, frankly that's pretty ridiculous as it's perfectly possible to
> emulate the interrupt controller in software (and an IOMMU is not needed
> at all if you are okay with emulated or paravirtualized devices---which
> is almost always the case except for partitioning hypervisors).

There's certainly some risk to freezing the H extension before we have 
all flavors of systems up and running.  I spent a lot of time arguing 
that case years ago before we started telling people that the H 
extension just needed implementation, but that's not the decision we 
made.  I don't really do RISC-V foundation stuff any more so I don't 
know why this changed, but it's just too late.  It would be wonderful to 
have an implementation of everything we need to build out one of these 
complex systems, but I just just don't see how the current plan gets 
there: that's a huge amount of work and I don't see why anyone would 
commit to that when they can't count on it being supported when it's 
released.

There are clearly some systems that can be built with this as it stands.  
They're not going to satisfy every use case, but at least we'll get 
people to start seriously using the spec.  That's the only way I can see 
to move forward with this.  It's pretty clear that sitting around and 
waiting doesn't work, we've tried that.

> Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> drivers/staging/riscv/kvm?

I'm certainly ready to drop my objections to merging the code based on 
it targeting a draft extension, but at a bare minimum I want to get a 
new policy in place that everyone can agree to for merging code.  I've 
tried to draft up a new policy a handful of times this week, but I'm not 
really quite sure how to go about this: ultimately trying to build 
stable interfaces around an unstable ISA is just a losing battle.  I've 
got a bunch of stuff going on right now, but I'll try to find some time 
to actually sit down and finish one.

I know it might seem odd to complain about how slowly things are going 
and then throw up another roadblock, but I really do think this is a 
very important thing to get right.  I'm just not sure how we're going to 
get anywhere with RISC-V without someone providing stability, so I want 
to make sure that whatever we do here can be done reliably.  If we don't 
I'm worried the vendors are just going to go off and do their own 
software stacks, which will make getting everyone back on the same page 
very difficult.

> Either way, the best way to do it would be like this:
>
> 1) you apply patch 1 in a topic branch
>
> 2) you merge the topic branch in the risc-v tree
>
> 3) Anup merges the topic branch too and sends me a pull request.
>
> Paolo
Anup Patel April 21, 2021, 4:08 a.m. UTC | #6
Hi Palmer,

On Sat, Apr 10, 2021 at 12:28 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
> > On 30/03/21 07:48, Anup Patel wrote:
> >>
> >> It seems Andrew does not want to freeze H-extension until we have virtualization
> >> aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
> >> of us feel that these things can be done independently because RISC-V
> >> H-extension already has provisions for external interrupt controller with
> >> virtualization support.
>
> Sorry to hear that.  It's really gotten to a point where I'm just
> embarrassed with how the RISC-V foundation is being run -- not sure if
> these other ones bled into Linux land, but this is the third ISA
> extension that's blown up over the last few weeks.  We had a lot of
> discussion about this on the binutils/GCC side of things and I've
> managed to convince myself that coupling the software stack to the
> specification process isn't viable -- we made that decision under the
> assumption that specifications would actually progress through the
> process, but in practice that's just not happening.
>
> My goal with the RISC-V stuff has always been getting us to a place
> where we have real shipping products running a software stack that is as
> close as possible to the upstream codebases.  I see that as the only way
> to get the software stack to a point where it can be sustainably
> maintained.  The "only frozen extensions" policy was meant to help this
> by steering vendors towards a common base we could support, but in
> practice it's just not working out.  The specification process is just
> so unreliable that in practice everything that gets built ends up
> relying on some non-standard behavior: whether it's a draft extension,
> some vendor-specific extension, or just some implementation quirks.
> There's always going to be some degree of that going on, but over the
> last year or so we've just stopped progressing.
>
> My worry with accepting the draft extensions is that we have no
> guarantee of compatibility between various drafts, which makes
> supporting multiple versions much more difficult.  I've always really
> only been worried about supporting what gets implemented in a chip I can
> actually run code on, as I can at least guarantee that doesn't change.
> In practice that really has nothing to do with the specification freeze:
> even ratified specifications change in ways that break compatibility so
> we need to support multiple versions anyway.  That's why we've got
> things like the K210 support (which doesn't quite follow the ratified
> specs) and are going to take the errata stuff.  I hadn't been all that
> worried about the H support because there was a plan to get is to
> hardware, but with the change I'm not really sure how that's going to
> happen.
>
> > Yes, frankly that's pretty ridiculous as it's perfectly possible to
> > emulate the interrupt controller in software (and an IOMMU is not needed
> > at all if you are okay with emulated or paravirtualized devices---which
> > is almost always the case except for partitioning hypervisors).
>
> There's certainly some risk to freezing the H extension before we have
> all flavors of systems up and running.  I spent a lot of time arguing
> that case years ago before we started telling people that the H
> extension just needed implementation, but that's not the decision we
> made.  I don't really do RISC-V foundation stuff any more so I don't
> know why this changed, but it's just too late.  It would be wonderful to
> have an implementation of everything we need to build out one of these
> complex systems, but I just just don't see how the current plan gets
> there: that's a huge amount of work and I don't see why anyone would
> commit to that when they can't count on it being supported when it's
> released.
>
> There are clearly some systems that can be built with this as it stands.
> They're not going to satisfy every use case, but at least we'll get
> people to start seriously using the spec.  That's the only way I can see
> to move forward with this.  It's pretty clear that sitting around and
> waiting doesn't work, we've tried that.
>
> > Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> > drivers/staging/riscv/kvm?
>
> I'm certainly ready to drop my objections to merging the code based on
> it targeting a draft extension, but at a bare minimum I want to get a
> new policy in place that everyone can agree to for merging code.  I've
> tried to draft up a new policy a handful of times this week, but I'm not
> really quite sure how to go about this: ultimately trying to build
> stable interfaces around an unstable ISA is just a losing battle.  I've
> got a bunch of stuff going on right now, but I'll try to find some time
> to actually sit down and finish one.

Can you send the patch for the updated policy which we can review ??

Will it be possible to get KVM RISC-V merged for Linux-5.13 ?

Regards,
Anup

>
> I know it might seem odd to complain about how slowly things are going
> and then throw up another roadblock, but I really do think this is a
> very important thing to get right.  I'm just not sure how we're going to
> get anywhere with RISC-V without someone providing stability, so I want
> to make sure that whatever we do here can be done reliably.  If we don't
> I'm worried the vendors are just going to go off and do their own
> software stacks, which will make getting everyone back on the same page
> very difficult.
>
> > Either way, the best way to do it would be like this:
> >
> > 1) you apply patch 1 in a topic branch
> >
> > 2) you merge the topic branch in the risc-v tree
> >
> > 3) Anup merges the topic branch too and sends me a pull request.
> >
> > Paolo
Paul Walmsley April 27, 2021, 5:43 a.m. UTC | #7
On Fri, 9 Apr 2021, Palmer Dabbelt wrote:

> On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
> 
> > Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> > drivers/staging/riscv/kvm?
> 
> I'm certainly ready to drop my objections to merging the code based on 
> it targeting a draft extension, but at a bare minimum I want to get a 
> new policy in place that everyone can agree to for merging code.  I've 
> tried to draft up a new policy a handful of times this week, but I'm not 
> really quite sure how to go about this: ultimately trying to build 
> stable interfaces around an unstable ISA is just a losing battle.  I've 
> got a bunch of stuff going on right now, but I'll try to find some time 
> to actually sit down and finish one.
> 
> I know it might seem odd to complain about how slowly things are going 
> and then throw up another roadblock, but I really do think this is a 
> very important thing to get right.  I'm just not sure how we're going to 
> get anywhere with RISC-V without someone providing stability, so I want 
> to make sure that whatever we do here can be done reliably.  If we don't 
> I'm worried the vendors are just going to go off and do their own 
> software stacks, which will make getting everyone back on the same page 
> very difficult.

I sympathize with Paolo, Anup, and others also.  Especially Anup, who has 
been updating and carrying the hypervisor patches for a long time now.  
And also Greentime, who has been carrying the V extension patches.  The 
RISC-V hypervisor specification, like several other RISC-V draft 
specifications, is taking longer to transition to the officially "frozen" 
stage than almost anyone in the RISC-V community would like.

Since we share this frustration, the next questions are: 

- What are the root causes of the problem?  

- What's the right forum to address the root causes?

To me, the root causes of the problems described in this thread aren't 
with the arch/riscv kernel maintenance guidelines, but rather with the 
RISC-V specification process itself.  And the right forum to address 
issues with the RISC-V specification process is with RISC-V International 
itself: the mailing lists, the participants, and the board of directors.  
Part of the challenge -- not simply with RISC-V, but with the Linux kernel 
or any other community -- is to ensure that incentives (and disincentives) 
are aligned with the appropriately responsible parts of the community.  
And when it comes to specification development, the right focus to align 
those incentives and disincentives is on RISC-V International.

The arch/riscv patch acceptance guidelines are simply intended to ensure 
that the definition of what is and isn't RISC-V remains clear and 
unambiguous.  Even though the guidelines can result in short-term pain, 
the intention is to promote long-term stability and sustainable 
maintainability - particularly since the specifications get baked into 
hardware.  We've observed that attempting to chase draft specifications 
can cause significant churn: for example, the history of the RISC-V vector 
specification illustrates how a draft extension can undergo major, 
unexpected revisions throughout its journey towards ratification.  One of 
our responsibilities as kernel developers is to minimize that churn - not 
simply for our own sanity, or for the usability of RISC-V, but to ensure 
that we remain members in good standing of the broader kernel community.  
Those of us who were around for the ARM32 and ARM SoC kernel accelerando 
absorbed strong lessons in maintainability, and I doubt anyone here is 
interested in re-learning those the hard way.

RVI states that the association is open to community participation.  The 
organizations that have joined RVI, I believe, have a strong stake in the 
health of the RISC-V ecosystem, just as the folks have here in this 
discussion.  If the goal really is to get quality specifications out the 
door faster, then let's focus the energy towards building consensus 
towards improving the process at RISC-V International.  If that's 
possible, the benefits won't only accrue to Linux developers, but to the 
entire RISC-V hardware and software development community at large.  If 
nothing else, it will be an interesting test of whether RISC-V 
International can take action to address these concerns and balance them 
with those of other stakeholders in the process.


- Paul
Anup Patel April 27, 2021, 6:01 a.m. UTC | #8
Hi Paolo,

Looks like it will take more time for KVM RISC-V to be merged under arch/riscv.

Let's go ahead with your suggestion of having KVM RISC-V under drivers/staging
so that development is not blocked.

I will send-out v18 series which will add KVM RISC-V under the staging
directory.

Should we target Linux-5.14 ?

Regards,
Anup

On Tue, Apr 27, 2021 at 11:13 AM Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
> On Fri, 9 Apr 2021, Palmer Dabbelt wrote:
>
> > On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
> >
> > > Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> > > drivers/staging/riscv/kvm?
> >
> > I'm certainly ready to drop my objections to merging the code based on
> > it targeting a draft extension, but at a bare minimum I want to get a
> > new policy in place that everyone can agree to for merging code.  I've
> > tried to draft up a new policy a handful of times this week, but I'm not
> > really quite sure how to go about this: ultimately trying to build
> > stable interfaces around an unstable ISA is just a losing battle.  I've
> > got a bunch of stuff going on right now, but I'll try to find some time
> > to actually sit down and finish one.
> >
> > I know it might seem odd to complain about how slowly things are going
> > and then throw up another roadblock, but I really do think this is a
> > very important thing to get right.  I'm just not sure how we're going to
> > get anywhere with RISC-V without someone providing stability, so I want
> > to make sure that whatever we do here can be done reliably.  If we don't
> > I'm worried the vendors are just going to go off and do their own
> > software stacks, which will make getting everyone back on the same page
> > very difficult.
>
> I sympathize with Paolo, Anup, and others also.  Especially Anup, who has
> been updating and carrying the hypervisor patches for a long time now.
> And also Greentime, who has been carrying the V extension patches.  The
> RISC-V hypervisor specification, like several other RISC-V draft
> specifications, is taking longer to transition to the officially "frozen"
> stage than almost anyone in the RISC-V community would like.
>
> Since we share this frustration, the next questions are:
>
> - What are the root causes of the problem?
>
> - What's the right forum to address the root causes?
>
> To me, the root causes of the problems described in this thread aren't
> with the arch/riscv kernel maintenance guidelines, but rather with the
> RISC-V specification process itself.  And the right forum to address
> issues with the RISC-V specification process is with RISC-V International
> itself: the mailing lists, the participants, and the board of directors.
> Part of the challenge -- not simply with RISC-V, but with the Linux kernel
> or any other community -- is to ensure that incentives (and disincentives)
> are aligned with the appropriately responsible parts of the community.
> And when it comes to specification development, the right focus to align
> those incentives and disincentives is on RISC-V International.
>
> The arch/riscv patch acceptance guidelines are simply intended to ensure
> that the definition of what is and isn't RISC-V remains clear and
> unambiguous.  Even though the guidelines can result in short-term pain,
> the intention is to promote long-term stability and sustainable
> maintainability - particularly since the specifications get baked into
> hardware.  We've observed that attempting to chase draft specifications
> can cause significant churn: for example, the history of the RISC-V vector
> specification illustrates how a draft extension can undergo major,
> unexpected revisions throughout its journey towards ratification.  One of
> our responsibilities as kernel developers is to minimize that churn - not
> simply for our own sanity, or for the usability of RISC-V, but to ensure
> that we remain members in good standing of the broader kernel community.
> Those of us who were around for the ARM32 and ARM SoC kernel accelerando
> absorbed strong lessons in maintainability, and I doubt anyone here is
> interested in re-learning those the hard way.
>
> RVI states that the association is open to community participation.  The
> organizations that have joined RVI, I believe, have a strong stake in the
> health of the RISC-V ecosystem, just as the folks have here in this
> discussion.  If the goal really is to get quality specifications out the
> door faster, then let's focus the energy towards building consensus
> towards improving the process at RISC-V International.  If that's
> possible, the benefits won't only accrue to Linux developers, but to the
> entire RISC-V hardware and software development community at large.  If
> nothing else, it will be an interesting test of whether RISC-V
> International can take action to address these concerns and balance them
> with those of other stakeholders in the process.
>
>
> - Paul
Paolo Bonzini April 27, 2021, 7:04 a.m. UTC | #9
On 27/04/21 08:01, Anup Patel wrote:
> Hi Paolo,
> 
> Looks like it will take more time for KVM RISC-V to be merged under arch/riscv.
> 
> Let's go ahead with your suggestion of having KVM RISC-V under drivers/staging
> so that development is not blocked.
> 
> I will send-out v18 series which will add KVM RISC-V under the staging
> directory.
> 
> Should we target Linux-5.14 ?

Yes, 5.14 is reasonable.  You'll have to adjust the MMU notifiers for 
the new API introduced in 5.13.

Paolo

> Regards,
> Anup
> 
> On Tue, Apr 27, 2021 at 11:13 AM Paul Walmsley <paul.walmsley@sifive.com> wrote:
>>
>> On Fri, 9 Apr 2021, Palmer Dabbelt wrote:
>>
>>> On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
>>>
>>>> Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
>>>> drivers/staging/riscv/kvm?
>>>
>>> I'm certainly ready to drop my objections to merging the code based on
>>> it targeting a draft extension, but at a bare minimum I want to get a
>>> new policy in place that everyone can agree to for merging code.  I've
>>> tried to draft up a new policy a handful of times this week, but I'm not
>>> really quite sure how to go about this: ultimately trying to build
>>> stable interfaces around an unstable ISA is just a losing battle.  I've
>>> got a bunch of stuff going on right now, but I'll try to find some time
>>> to actually sit down and finish one.
>>>
>>> I know it might seem odd to complain about how slowly things are going
>>> and then throw up another roadblock, but I really do think this is a
>>> very important thing to get right.  I'm just not sure how we're going to
>>> get anywhere with RISC-V without someone providing stability, so I want
>>> to make sure that whatever we do here can be done reliably.  If we don't
>>> I'm worried the vendors are just going to go off and do their own
>>> software stacks, which will make getting everyone back on the same page
>>> very difficult.
>>
>> I sympathize with Paolo, Anup, and others also.  Especially Anup, who has
>> been updating and carrying the hypervisor patches for a long time now.
>> And also Greentime, who has been carrying the V extension patches.  The
>> RISC-V hypervisor specification, like several other RISC-V draft
>> specifications, is taking longer to transition to the officially "frozen"
>> stage than almost anyone in the RISC-V community would like.
>>
>> Since we share this frustration, the next questions are:
>>
>> - What are the root causes of the problem?
>>
>> - What's the right forum to address the root causes?
>>
>> To me, the root causes of the problems described in this thread aren't
>> with the arch/riscv kernel maintenance guidelines, but rather with the
>> RISC-V specification process itself.  And the right forum to address
>> issues with the RISC-V specification process is with RISC-V International
>> itself: the mailing lists, the participants, and the board of directors.
>> Part of the challenge -- not simply with RISC-V, but with the Linux kernel
>> or any other community -- is to ensure that incentives (and disincentives)
>> are aligned with the appropriately responsible parts of the community.
>> And when it comes to specification development, the right focus to align
>> those incentives and disincentives is on RISC-V International.
>>
>> The arch/riscv patch acceptance guidelines are simply intended to ensure
>> that the definition of what is and isn't RISC-V remains clear and
>> unambiguous.  Even though the guidelines can result in short-term pain,
>> the intention is to promote long-term stability and sustainable
>> maintainability - particularly since the specifications get baked into
>> hardware.  We've observed that attempting to chase draft specifications
>> can cause significant churn: for example, the history of the RISC-V vector
>> specification illustrates how a draft extension can undergo major,
>> unexpected revisions throughout its journey towards ratification.  One of
>> our responsibilities as kernel developers is to minimize that churn - not
>> simply for our own sanity, or for the usability of RISC-V, but to ensure
>> that we remain members in good standing of the broader kernel community.
>> Those of us who were around for the ARM32 and ARM SoC kernel accelerando
>> absorbed strong lessons in maintainability, and I doubt anyone here is
>> interested in re-learning those the hard way.
>>
>> RVI states that the association is open to community participation.  The
>> organizations that have joined RVI, I believe, have a strong stake in the
>> health of the RISC-V ecosystem, just as the folks have here in this
>> discussion.  If the goal really is to get quality specifications out the
>> door faster, then let's focus the energy towards building consensus
>> towards improving the process at RISC-V International.  If that's
>> possible, the benefits won't only accrue to Linux developers, but to the
>> entire RISC-V hardware and software development community at large.  If
>> nothing else, it will be an interesting test of whether RISC-V
>> International can take action to address these concerns and balance them
>> with those of other stakeholders in the process.
>>
>>
>> - Paul
>
Anup Patel April 28, 2021, 7:07 a.m. UTC | #10
On Tue, Apr 27, 2021 at 12:34 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 27/04/21 08:01, Anup Patel wrote:
> > Hi Paolo,
> >
> > Looks like it will take more time for KVM RISC-V to be merged under arch/riscv.
> >
> > Let's go ahead with your suggestion of having KVM RISC-V under drivers/staging
> > so that development is not blocked.
> >
> > I will send-out v18 series which will add KVM RISC-V under the staging
> > directory.
> >
> > Should we target Linux-5.14 ?
>
> Yes, 5.14 is reasonable.  You'll have to adjust the MMU notifiers for
> the new API introduced in 5.13.

Sure, I will rebase on the new API introduced in 5.13

Regards,
Anup

>
> Paolo
>
> > Regards,
> > Anup
> >
> > On Tue, Apr 27, 2021 at 11:13 AM Paul Walmsley <paul.walmsley@sifive.com> wrote:
> >>
> >> On Fri, 9 Apr 2021, Palmer Dabbelt wrote:
> >>
> >>> On Wed, 31 Mar 2021 02:21:58 PDT (-0700), pbonzini@redhat.com wrote:
> >>>
> >>>> Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> >>>> drivers/staging/riscv/kvm?
> >>>
> >>> I'm certainly ready to drop my objections to merging the code based on
> >>> it targeting a draft extension, but at a bare minimum I want to get a
> >>> new policy in place that everyone can agree to for merging code.  I've
> >>> tried to draft up a new policy a handful of times this week, but I'm not
> >>> really quite sure how to go about this: ultimately trying to build
> >>> stable interfaces around an unstable ISA is just a losing battle.  I've
> >>> got a bunch of stuff going on right now, but I'll try to find some time
> >>> to actually sit down and finish one.
> >>>
> >>> I know it might seem odd to complain about how slowly things are going
> >>> and then throw up another roadblock, but I really do think this is a
> >>> very important thing to get right.  I'm just not sure how we're going to
> >>> get anywhere with RISC-V without someone providing stability, so I want
> >>> to make sure that whatever we do here can be done reliably.  If we don't
> >>> I'm worried the vendors are just going to go off and do their own
> >>> software stacks, which will make getting everyone back on the same page
> >>> very difficult.
> >>
> >> I sympathize with Paolo, Anup, and others also.  Especially Anup, who has
> >> been updating and carrying the hypervisor patches for a long time now.
> >> And also Greentime, who has been carrying the V extension patches.  The
> >> RISC-V hypervisor specification, like several other RISC-V draft
> >> specifications, is taking longer to transition to the officially "frozen"
> >> stage than almost anyone in the RISC-V community would like.
> >>
> >> Since we share this frustration, the next questions are:
> >>
> >> - What are the root causes of the problem?
> >>
> >> - What's the right forum to address the root causes?
> >>
> >> To me, the root causes of the problems described in this thread aren't
> >> with the arch/riscv kernel maintenance guidelines, but rather with the
> >> RISC-V specification process itself.  And the right forum to address
> >> issues with the RISC-V specification process is with RISC-V International
> >> itself: the mailing lists, the participants, and the board of directors.
> >> Part of the challenge -- not simply with RISC-V, but with the Linux kernel
> >> or any other community -- is to ensure that incentives (and disincentives)
> >> are aligned with the appropriately responsible parts of the community.
> >> And when it comes to specification development, the right focus to align
> >> those incentives and disincentives is on RISC-V International.
> >>
> >> The arch/riscv patch acceptance guidelines are simply intended to ensure
> >> that the definition of what is and isn't RISC-V remains clear and
> >> unambiguous.  Even though the guidelines can result in short-term pain,
> >> the intention is to promote long-term stability and sustainable
> >> maintainability - particularly since the specifications get baked into
> >> hardware.  We've observed that attempting to chase draft specifications
> >> can cause significant churn: for example, the history of the RISC-V vector
> >> specification illustrates how a draft extension can undergo major,
> >> unexpected revisions throughout its journey towards ratification.  One of
> >> our responsibilities as kernel developers is to minimize that churn - not
> >> simply for our own sanity, or for the usability of RISC-V, but to ensure
> >> that we remain members in good standing of the broader kernel community.
> >> Those of us who were around for the ARM32 and ARM SoC kernel accelerando
> >> absorbed strong lessons in maintainability, and I doubt anyone here is
> >> interested in re-learning those the hard way.
> >>
> >> RVI states that the association is open to community participation.  The
> >> organizations that have joined RVI, I believe, have a strong stake in the
> >> health of the RISC-V ecosystem, just as the folks have here in this
> >> discussion.  If the goal really is to get quality specifications out the
> >> door faster, then let's focus the energy towards building consensus
> >> towards improving the process at RISC-V International.  If that's
> >> possible, the benefits won't only accrue to Linux developers, but to the
> >> entire RISC-V hardware and software development community at large.  If
> >> nothing else, it will be an interesting test of whether RISC-V
> >> International can take action to address these concerns and balance them
> >> with those of other stakeholders in the process.
> >>
> >>
> >> - Paul
> >
>