mbox series

[v18,00/18] KVM RISC-V Support

Message ID 20210519033553.1110536-1-anup.patel@wdc.com (mailing list archive)
Headers show
Series KVM RISC-V Support | expand

Message

Anup Patel May 19, 2021, 3:35 a.m. UTC
From: Anup Patel <anup@brainfault.org>

This series adds initial KVM RISC-V support. Currently, we are able to boot
Linux on RV64/RV32 Guest with multiple VCPUs.

Key aspects of KVM RISC-V added by this series are:
1. No RISC-V specific KVM IOCTL
2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
3. Both RV64 and RV32 host supported
4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
5. KVM ONE_REG interface for VCPU register access from user-space
6. PLIC emulation is done in user-space
7. Timer and IPI emuation is done in-kernel
8. Both Sv39x4 and Sv48x4 supported for RV64 host
9. MMU notifiers supported
10. Generic dirtylog supported
11. FP lazy save/restore supported
12. SBI v0.1 emulation for KVM Guest available
13. Forward unhandled SBI calls to KVM userspace
14. Hugepage support for Guest/VM
15. IOEVENTFD support for Vhost

Here's a brief TODO list which we will work upon after this series:
1. SBI v0.2 emulation in-kernel
2. SBI v0.2 hart state management emulation in-kernel
3. In-kernel PLIC emulation
4. ..... and more .....

This series can be found in riscv_kvm_v18 branch at:
https//github.com/avpatel/linux.git

Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
at: https//github.com/avpatel/kvmtool.git

The QEMU RISC-V hypervisor emulation is done by Alistair and is available
in master branch at: https://git.qemu.org/git/qemu.git

To play around with KVM RISC-V, refer KVM RISC-V wiki at:
https://github.com/kvm-riscv/howto/wiki
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike

Changes since v17:
 - Rebased on Linux-5.13-rc2
 - Moved to new KVM MMU notifier APIs
 - Removed redundant kvm_arch_vcpu_uninit()
 - Moved KVM RISC-V sources to drivers/staging for compliance with
   Linux RISC-V patch acceptance policy

Changes since v16:
 - Rebased on Linux-5.12-rc5
 - Remove redundant kvm_arch_create_memslot(), kvm_arch_vcpu_setup(),
   kvm_arch_vcpu_init(), kvm_arch_has_vcpu_debugfs(), and
   kvm_arch_create_vcpu_debugfs() from PATCH5
 - Make stage2_wp_memory_region() and stage2_ioremap() as static
   in PATCH13

Changes since v15:
 - Rebased on Linux-5.11-rc3
 - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
   writeability of a host pfn.
 - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
   uapi/asm/kvm.h

Changes since v14:
 - Rebased on Linux-5.10-rc3
 - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned

Changes since v13:
 - Rebased on Linux-5.9-rc3
 - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
 - Fixed instruction length computation in PATCH7
 - Added ioeventfd support in PATCH7
 - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
   intructions in PATCH7
 - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
   in PATCH10
 - Added stage2 dirty page logging in PATCH10
 - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
 - Save/restore SCOUNTEREN in PATCH6
 - Reduced quite a few instructions for __kvm_riscv_switch_to() by
   using CSR swap instruction in PATCH6
 - Detect and use Sv48x4 when available in PATCH10

Changes since v12:
 - Rebased patches on Linux-5.8-rc4
 - By default enable all counters in HCOUNTEREN
 - RISC-V H-Extension v0.6.1 spec support

Changes since v11:
 - Rebased patches on Linux-5.7-rc3
 - Fixed typo in typecast of stage2_map_size define
 - Introduced struct kvm_cpu_trap to represent trap details and
   use it as function parameter wherever applicable
 - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
   logging in future
 - RISC-V H-Extension v0.6 spec support
 - Send-out first three patches as separate series so that it can
   be taken by Palmer for Linux RISC-V

Changes since v10:
 - Rebased patches on Linux-5.6-rc5
 - Reduce RISCV_ISA_EXT_MAX from 256 to 64
 - Separate PATCH for removing N-extension related defines
 - Added comments as requested by Palmer
 - Fixed HIDELEG CSR programming

Changes since v9:
 - Rebased patches on Linux-5.5-rc3
 - Squash PATCH19 and PATCH20 into PATCH5
 - Squash PATCH18 into PATCH11
 - Squash PATCH17 into PATCH16
 - Added ONE_REG interface for VCPU timer in PATCH13
 - Use HTIMEDELTA for VCPU timer in PATCH13
 - Updated KVM RISC-V mailing list in MAINTAINERS entry
 - Update KVM kconfig option to depend on RISCV_SBI and MMU
 - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
 - Use SBI v0.2 RFENCE extension in VMID implementation
 - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
 - Use SBI v0.2 RFENCE extension in SBI implementation
 - Moved to RISC-V Hypervisor v0.5 draft spec
 - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface

Changes since v8:
 - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
 - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
 - Fixed kvm_riscv_stage2_map() to handle hugepages
 - Added patch to forward unhandled SBI calls to user-space
 - Added patch for iterative/recursive stage2 page table programming
 - Added patch to remove per-CPU vsip_shadow variable
 - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()

Changes since v7:
 - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
 - Removed PATCH1, PATCH3, and PATCH20 because these already merged
 - Use kernel doc style comments for ISA bitmap functions
 - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
   be added in-future
 - Mark KVM RISC-V kconfig option as EXPERIMENTAL
 - Typo fix in commit description of PATCH6 of v7 series
 - Use separate structs for CORE and CSR registers of ONE_REG interface
 - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
 - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
 - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
 - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
 - Added defines for checking/decoding instruction length
 - Added separate patch to forward unhandled SBI calls to userspace tool

Changes since v6:
 - Rebased patches on Linux-5.3-rc7
 - Added "return_handled" in struct kvm_mmio_decode to ensure that
   kvm_riscv_vcpu_mmio_return() updates SEPC only once
 - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
 - Updated git repo URL in MAINTAINERS entry

Changes since v5:
 - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
   KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
 - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
 - Use switch case instead of illegal instruction opcode table for simplicity
 - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
  flush optimization
 - Handle all unsupported SBI calls in default case of
   kvm_riscv_vcpu_sbi_ecall() function
 - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
 - Improved unprivilege reads to handle traps due to Guest stage1 page table
 - Added separate patch to document RISC-V specific things in
   Documentation/virt/kvm/api.txt

Changes since v4:
 - Rebased patches on Linux-5.3-rc5
 - Added Paolo's Acked-by and Reviewed-by
 - Updated mailing list in MAINTAINERS entry

Changes since v3:
 - Moved patch for ISA bitmap from KVM prep series to this series
 - Make vsip_shadow as run-time percpu variable instead of compile-time
 - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
 - Removed references of KVM_REQ_IRQ_PENDING from all patches
 - Use kvm->srcu within in-kernel KVM run loop
 - Added percpu vsip_shadow to track last value programmed in VSIP CSR
 - Added comments about irqs_pending and irqs_pending_mask
 - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
   in system_opcode_insn()
 - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
 - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
 - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
 - Fixed compile errors in building KVM RISC-V as module
 - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
 - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
 - Made vmid_version as unsigned long instead of atomic
 - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
 - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
 - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
 - Updated ONE_REG interface for CSR access to user-space
 - Removed irqs_pending_lock and use atomic bitops instead
 - Added separate patch for FP ONE_REG interface
 - Added separate patch for updating MAINTAINERS file

Anup Patel (14):
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM support
  RISC-V: KVM: Implement VCPU create, init and destroy functions
  RISC-V: KVM: Implement VCPU interrupts and requests handling
  RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  RISC-V: KVM: Implement VCPU world-switch
  RISC-V: KVM: Handle MMIO exits for VCPU
  RISC-V: KVM: Handle WFI exits for VCPU
  RISC-V: KVM: Implement VMID allocator
  RISC-V: KVM: Implement stage2 page table programming
  RISC-V: KVM: Implement MMU notifiers
  RISC-V: KVM: Document RISC-V specific parts of KVM API
  RISC-V: KVM: Move sources to drivers/staging directory
  RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (4):
  RISC-V: KVM: Add timer functionality
  RISC-V: KVM: FP lazy save/restore
  RISC-V: KVM: Implement ONE REG interface for FP registers
  RISC-V: KVM: Add SBI v0.1 support

 Documentation/virt/kvm/api.rst                | 193 +++-
 MAINTAINERS                                   |  11 +
 arch/riscv/Kconfig                            |   1 +
 arch/riscv/Makefile                           |   1 +
 arch/riscv/include/uapi/asm/kvm.h             | 128 +++
 drivers/clocksource/timer-riscv.c             |   9 +
 drivers/staging/riscv/kvm/Kconfig             |  36 +
 drivers/staging/riscv/kvm/Makefile            |  23 +
 drivers/staging/riscv/kvm/asm/kvm_csr.h       | 105 ++
 drivers/staging/riscv/kvm/asm/kvm_host.h      | 271 +++++
 drivers/staging/riscv/kvm/asm/kvm_types.h     |   7 +
 .../staging/riscv/kvm/asm/kvm_vcpu_timer.h    |  44 +
 drivers/staging/riscv/kvm/main.c              | 118 +++
 drivers/staging/riscv/kvm/mmu.c               | 802 ++++++++++++++
 drivers/staging/riscv/kvm/riscv_offsets.c     | 170 +++
 drivers/staging/riscv/kvm/tlb.S               |  74 ++
 drivers/staging/riscv/kvm/vcpu.c              | 997 ++++++++++++++++++
 drivers/staging/riscv/kvm/vcpu_exit.c         | 701 ++++++++++++
 drivers/staging/riscv/kvm/vcpu_sbi.c          | 173 +++
 drivers/staging/riscv/kvm/vcpu_switch.S       | 401 +++++++
 drivers/staging/riscv/kvm/vcpu_timer.c        | 225 ++++
 drivers/staging/riscv/kvm/vm.c                |  81 ++
 drivers/staging/riscv/kvm/vmid.c              | 120 +++
 include/clocksource/timer-riscv.h             |  16 +
 include/uapi/linux/kvm.h                      |   8 +
 25 files changed, 4706 insertions(+), 9 deletions(-)
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 drivers/staging/riscv/kvm/Kconfig
 create mode 100644 drivers/staging/riscv/kvm/Makefile
 create mode 100644 drivers/staging/riscv/kvm/asm/kvm_csr.h
 create mode 100644 drivers/staging/riscv/kvm/asm/kvm_host.h
 create mode 100644 drivers/staging/riscv/kvm/asm/kvm_types.h
 create mode 100644 drivers/staging/riscv/kvm/asm/kvm_vcpu_timer.h
 create mode 100644 drivers/staging/riscv/kvm/main.c
 create mode 100644 drivers/staging/riscv/kvm/mmu.c
 create mode 100644 drivers/staging/riscv/kvm/riscv_offsets.c
 create mode 100644 drivers/staging/riscv/kvm/tlb.S
 create mode 100644 drivers/staging/riscv/kvm/vcpu.c
 create mode 100644 drivers/staging/riscv/kvm/vcpu_exit.c
 create mode 100644 drivers/staging/riscv/kvm/vcpu_sbi.c
 create mode 100644 drivers/staging/riscv/kvm/vcpu_switch.S
 create mode 100644 drivers/staging/riscv/kvm/vcpu_timer.c
 create mode 100644 drivers/staging/riscv/kvm/vm.c
 create mode 100644 drivers/staging/riscv/kvm/vmid.c
 create mode 100644 include/clocksource/timer-riscv.h

Comments

Greg Kroah-Hartman May 19, 2021, 4:58 a.m. UTC | #1
On Wed, May 19, 2021 at 09:05:35AM +0530, Anup Patel wrote:
> From: Anup Patel <anup@brainfault.org>
> 
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> Linux on RV64/RV32 Guest with multiple VCPUs.
> 
> Key aspects of KVM RISC-V added by this series are:
> 1. No RISC-V specific KVM IOCTL
> 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> 3. Both RV64 and RV32 host supported
> 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> 5. KVM ONE_REG interface for VCPU register access from user-space
> 6. PLIC emulation is done in user-space
> 7. Timer and IPI emuation is done in-kernel
> 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> 9. MMU notifiers supported
> 10. Generic dirtylog supported
> 11. FP lazy save/restore supported
> 12. SBI v0.1 emulation for KVM Guest available
> 13. Forward unhandled SBI calls to KVM userspace
> 14. Hugepage support for Guest/VM
> 15. IOEVENTFD support for Vhost
> 
> Here's a brief TODO list which we will work upon after this series:
> 1. SBI v0.2 emulation in-kernel
> 2. SBI v0.2 hart state management emulation in-kernel
> 3. In-kernel PLIC emulation
> 4. ..... and more .....
> 
> This series can be found in riscv_kvm_v18 branch at:
> https//github.com/avpatel/linux.git
> 
> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
> at: https//github.com/avpatel/kvmtool.git
> 
> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> in master branch at: https://git.qemu.org/git/qemu.git
> 
> To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> https://github.com/kvm-riscv/howto/wiki
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> 
> Changes since v17:
>  - Rebased on Linux-5.13-rc2
>  - Moved to new KVM MMU notifier APIs
>  - Removed redundant kvm_arch_vcpu_uninit()
>  - Moved KVM RISC-V sources to drivers/staging for compliance with
>    Linux RISC-V patch acceptance policy

What is this new "patch acceptance policy" and what does it have to do
with drivers/staging?

What does drivers/staging/ have to do with this at all?  Did anyone ask
the staging maintainer about this?

Not cool, and not something I'm about to take without some very good
reasons...

greg k-h
Anup Patel May 19, 2021, 5:10 a.m. UTC | #2
On Wed, May 19, 2021 at 10:28 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, May 19, 2021 at 09:05:35AM +0530, Anup Patel wrote:
> > From: Anup Patel <anup@brainfault.org>
> >
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > Linux on RV64/RV32 Guest with multiple VCPUs.
> >
> > Key aspects of KVM RISC-V added by this series are:
> > 1. No RISC-V specific KVM IOCTL
> > 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> > 3. Both RV64 and RV32 host supported
> > 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> > 5. KVM ONE_REG interface for VCPU register access from user-space
> > 6. PLIC emulation is done in user-space
> > 7. Timer and IPI emuation is done in-kernel
> > 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> > 9. MMU notifiers supported
> > 10. Generic dirtylog supported
> > 11. FP lazy save/restore supported
> > 12. SBI v0.1 emulation for KVM Guest available
> > 13. Forward unhandled SBI calls to KVM userspace
> > 14. Hugepage support for Guest/VM
> > 15. IOEVENTFD support for Vhost
> >
> > Here's a brief TODO list which we will work upon after this series:
> > 1. SBI v0.2 emulation in-kernel
> > 2. SBI v0.2 hart state management emulation in-kernel
> > 3. In-kernel PLIC emulation
> > 4. ..... and more .....
> >
> > This series can be found in riscv_kvm_v18 branch at:
> > https//github.com/avpatel/linux.git
> >
> > Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
> > at: https//github.com/avpatel/kvmtool.git
> >
> > The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> > in master branch at: https://git.qemu.org/git/qemu.git
> >
> > To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> > https://github.com/kvm-riscv/howto/wiki
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> >
> > Changes since v17:
> >  - Rebased on Linux-5.13-rc2
> >  - Moved to new KVM MMU notifier APIs
> >  - Removed redundant kvm_arch_vcpu_uninit()
> >  - Moved KVM RISC-V sources to drivers/staging for compliance with
> >    Linux RISC-V patch acceptance policy
>
> What is this new "patch acceptance policy" and what does it have to do
> with drivers/staging?

The Linux RISC-V patch acceptance policy is here:
Documentation/riscv/patch-acceptance.rst

As-per this policy, the Linux RISC-V maintainers will only accept
patches for frozen/ratified RISC-V extensions. Basically, it links the
Linux RISC-V development process with the RISC-V foundation
process which is painfully slow.

The KVM RISC-V patches have been sitting on the lists for almost
2 years now. The requirements for freezing RISC-V H-extension
(hypervisor extension) keeps changing and we are not clear when
it will be frozen. In fact, quite a few people have already implemented
RISC-V H-extension in hardware as well and KVM RISC-V works
on real HW as well.

Rationale of moving KVM RISC-V to drivers/staging is to continue
KVM RISC-V development without breaking the Linux RISC-V patch
acceptance policy until RISC-V H-extension is frozen. Once, RISC-V
H-extension is frozen we will move KVM RISC-V back to arch/riscv
(like other architectures).

>
> What does drivers/staging/ have to do with this at all?  Did anyone ask
> the staging maintainer about this?

Yes, Paolo (KVM maintainer) suggested having KVM RISC-V under
drivers/staging until RISC-V H-extension is frozen and continue the
KVM RISC-V development from there.

>
> Not cool, and not something I'm about to take without some very good
> reasons...

Regards,
Anup
Greg Kroah-Hartman May 19, 2021, 5:21 a.m. UTC | #3
On Wed, May 19, 2021 at 10:40:13AM +0530, Anup Patel wrote:
> On Wed, May 19, 2021 at 10:28 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, May 19, 2021 at 09:05:35AM +0530, Anup Patel wrote:
> > > From: Anup Patel <anup@brainfault.org>
> > >
> > > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > > Linux on RV64/RV32 Guest with multiple VCPUs.
> > >
> > > Key aspects of KVM RISC-V added by this series are:
> > > 1. No RISC-V specific KVM IOCTL
> > > 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> > > 3. Both RV64 and RV32 host supported
> > > 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> > > 5. KVM ONE_REG interface for VCPU register access from user-space
> > > 6. PLIC emulation is done in user-space
> > > 7. Timer and IPI emuation is done in-kernel
> > > 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> > > 9. MMU notifiers supported
> > > 10. Generic dirtylog supported
> > > 11. FP lazy save/restore supported
> > > 12. SBI v0.1 emulation for KVM Guest available
> > > 13. Forward unhandled SBI calls to KVM userspace
> > > 14. Hugepage support for Guest/VM
> > > 15. IOEVENTFD support for Vhost
> > >
> > > Here's a brief TODO list which we will work upon after this series:
> > > 1. SBI v0.2 emulation in-kernel
> > > 2. SBI v0.2 hart state management emulation in-kernel
> > > 3. In-kernel PLIC emulation
> > > 4. ..... and more .....
> > >
> > > This series can be found in riscv_kvm_v18 branch at:
> > > https//github.com/avpatel/linux.git
> > >
> > > Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
> > > at: https//github.com/avpatel/kvmtool.git
> > >
> > > The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> > > in master branch at: https://git.qemu.org/git/qemu.git
> > >
> > > To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> > > https://github.com/kvm-riscv/howto/wiki
> > > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> > > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> > >
> > > Changes since v17:
> > >  - Rebased on Linux-5.13-rc2
> > >  - Moved to new KVM MMU notifier APIs
> > >  - Removed redundant kvm_arch_vcpu_uninit()
> > >  - Moved KVM RISC-V sources to drivers/staging for compliance with
> > >    Linux RISC-V patch acceptance policy
> >
> > What is this new "patch acceptance policy" and what does it have to do
> > with drivers/staging?
> 
> The Linux RISC-V patch acceptance policy is here:
> Documentation/riscv/patch-acceptance.rst
> 
> As-per this policy, the Linux RISC-V maintainers will only accept
> patches for frozen/ratified RISC-V extensions. Basically, it links the
> Linux RISC-V development process with the RISC-V foundation
> process which is painfully slow.
> 
> The KVM RISC-V patches have been sitting on the lists for almost
> 2 years now. The requirements for freezing RISC-V H-extension
> (hypervisor extension) keeps changing and we are not clear when
> it will be frozen. In fact, quite a few people have already implemented
> RISC-V H-extension in hardware as well and KVM RISC-V works
> on real HW as well.
> 
> Rationale of moving KVM RISC-V to drivers/staging is to continue
> KVM RISC-V development without breaking the Linux RISC-V patch
> acceptance policy until RISC-V H-extension is frozen. Once, RISC-V
> H-extension is frozen we will move KVM RISC-V back to arch/riscv
> (like other architectures).

Wait, no, this has nothing to do with what drivers/staging/ is for and
how it is used.  Again, not ok.

> > What does drivers/staging/ have to do with this at all?  Did anyone ask
> > the staging maintainer about this?
> 
> Yes, Paolo (KVM maintainer) suggested having KVM RISC-V under
> drivers/staging until RISC-V H-extension is frozen and continue the
> KVM RISC-V development from there.

staging is not for stuff like this at all.  It is for code that is
self-contained (not this) and needs work to get merged into the main
part of the kernel (listed in a TODO file, and is not this).

It is not a dumping ground for stuff that arch maintainers can not seem
to agree on, and it is not a place where you can just randomly play
around with user/kernel apis with no consequences.

So no, sorry, not going to take this code at all.

gre gk-h
Greg Kroah-Hartman May 19, 2021, 10:47 a.m. UTC | #4
On Wed, May 19, 2021 at 07:21:54AM +0200, Greg Kroah-Hartman wrote:
> On Wed, May 19, 2021 at 10:40:13AM +0530, Anup Patel wrote:
> > On Wed, May 19, 2021 at 10:28 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, May 19, 2021 at 09:05:35AM +0530, Anup Patel wrote:
> > > > From: Anup Patel <anup@brainfault.org>
> > > >
> > > > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > > > Linux on RV64/RV32 Guest with multiple VCPUs.
> > > >
> > > > Key aspects of KVM RISC-V added by this series are:
> > > > 1. No RISC-V specific KVM IOCTL
> > > > 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> > > > 3. Both RV64 and RV32 host supported
> > > > 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> > > > 5. KVM ONE_REG interface for VCPU register access from user-space
> > > > 6. PLIC emulation is done in user-space
> > > > 7. Timer and IPI emuation is done in-kernel
> > > > 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> > > > 9. MMU notifiers supported
> > > > 10. Generic dirtylog supported
> > > > 11. FP lazy save/restore supported
> > > > 12. SBI v0.1 emulation for KVM Guest available
> > > > 13. Forward unhandled SBI calls to KVM userspace
> > > > 14. Hugepage support for Guest/VM
> > > > 15. IOEVENTFD support for Vhost
> > > >
> > > > Here's a brief TODO list which we will work upon after this series:
> > > > 1. SBI v0.2 emulation in-kernel
> > > > 2. SBI v0.2 hart state management emulation in-kernel
> > > > 3. In-kernel PLIC emulation
> > > > 4. ..... and more .....
> > > >
> > > > This series can be found in riscv_kvm_v18 branch at:
> > > > https//github.com/avpatel/linux.git
> > > >
> > > > Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
> > > > at: https//github.com/avpatel/kvmtool.git
> > > >
> > > > The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> > > > in master branch at: https://git.qemu.org/git/qemu.git
> > > >
> > > > To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> > > > https://github.com/kvm-riscv/howto/wiki
> > > > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> > > > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> > > >
> > > > Changes since v17:
> > > >  - Rebased on Linux-5.13-rc2
> > > >  - Moved to new KVM MMU notifier APIs
> > > >  - Removed redundant kvm_arch_vcpu_uninit()
> > > >  - Moved KVM RISC-V sources to drivers/staging for compliance with
> > > >    Linux RISC-V patch acceptance policy
> > >
> > > What is this new "patch acceptance policy" and what does it have to do
> > > with drivers/staging?
> > 
> > The Linux RISC-V patch acceptance policy is here:
> > Documentation/riscv/patch-acceptance.rst
> > 
> > As-per this policy, the Linux RISC-V maintainers will only accept
> > patches for frozen/ratified RISC-V extensions. Basically, it links the
> > Linux RISC-V development process with the RISC-V foundation
> > process which is painfully slow.
> > 
> > The KVM RISC-V patches have been sitting on the lists for almost
> > 2 years now. The requirements for freezing RISC-V H-extension
> > (hypervisor extension) keeps changing and we are not clear when
> > it will be frozen. In fact, quite a few people have already implemented
> > RISC-V H-extension in hardware as well and KVM RISC-V works
> > on real HW as well.
> > 
> > Rationale of moving KVM RISC-V to drivers/staging is to continue
> > KVM RISC-V development without breaking the Linux RISC-V patch
> > acceptance policy until RISC-V H-extension is frozen. Once, RISC-V
> > H-extension is frozen we will move KVM RISC-V back to arch/riscv
> > (like other architectures).
> 
> Wait, no, this has nothing to do with what drivers/staging/ is for and
> how it is used.  Again, not ok.
> 
> > > What does drivers/staging/ have to do with this at all?  Did anyone ask
> > > the staging maintainer about this?
> > 
> > Yes, Paolo (KVM maintainer) suggested having KVM RISC-V under
> > drivers/staging until RISC-V H-extension is frozen and continue the
> > KVM RISC-V development from there.
> 
> staging is not for stuff like this at all.  It is for code that is
> self-contained (not this) and needs work to get merged into the main
> part of the kernel (listed in a TODO file, and is not this).
> 
> It is not a dumping ground for stuff that arch maintainers can not seem
> to agree on, and it is not a place where you can just randomly play
> around with user/kernel apis with no consequences.
> 
> So no, sorry, not going to take this code at all.

And to be a bit more clear about this, having other subsystem
maintainers drop their unwanted code on this subsystem, _without_ even
asking me first is just not very nice.  All of a sudden I am now
responsible for this stuff, without me even being asked about it.
Should I start throwing random drivers into the kvm subsystem for them
to maintain because I don't want to?  :)

If there's really no other way to do this, than to put it in staging,
let's talk about it.  But saying "this must go here" is not a
conversation...

thanks,

greg k-h
Paolo Bonzini May 19, 2021, 11:18 a.m. UTC | #5
On 19/05/21 12:47, Greg Kroah-Hartman wrote:
>> It is not a dumping ground for stuff that arch maintainers can not seem
>> to agree on, and it is not a place where you can just randomly play
>> around with user/kernel apis with no consequences.
>>
>> So no, sorry, not going to take this code at all.
>
> And to be a bit more clear about this, having other subsystem
> maintainers drop their unwanted code on this subsystem,_without_  even
> asking me first is just not very nice. All of a sudden I am now > responsible for this stuff, without me even being asked about it.
> Should I start throwing random drivers into the kvm subsystem for them
> to maintain because I don't want to?:)

(I did see the smiley), I'm on board with throwing random drivers in 
arch/riscv. :)

The situation here didn't seem very far from what process/2.Process.rst 
says about staging:

- "a way to keep track of drivers that aren't up to standards", though 
in this case the issue is not coding standards or quality---the code is 
very good---and which people "may want to use"

- the code could be removed if there's no progress on either changing 
the RISC-V acceptance policy or ratifying the spec

Of course there should have been a TODO file explaining the situation. 
But if you think this is not the right place, I totally understand; if 
my opinion had any weight in this, I would just place it in arch/riscv/kvm.

The RISC-V acceptance policy as is just doesn't work, and the fact that 
people are trying to work around it is proving it.  There are many ways 
to improve it:

- get rid of it;

- provide a path to get an exception;

- provide a staging place sot hat people to do their job of contributing 
code to Linux (e.g. arch/riscv/staging/kvm).

If everything else fail, I guess we can place it in 
drivers/virt/riscv/kvm, even though that's just as silly a workaround. 
It's a pity because the RISC-V virtualization architecture has a very 
nice design, and the KVM code is also a very good example of how to do 
things right.

Paolo

> If there's really no other way to do this, than to put it in staging,
> let's talk about it.  But saying "this must go here" is not a
> conversation...
Greg Kroah-Hartman May 19, 2021, 12:23 p.m. UTC | #6
On Wed, May 19, 2021 at 01:18:44PM +0200, Paolo Bonzini wrote:
> On 19/05/21 12:47, Greg Kroah-Hartman wrote:
> > > It is not a dumping ground for stuff that arch maintainers can not seem
> > > to agree on, and it is not a place where you can just randomly play
> > > around with user/kernel apis with no consequences.
> > > 
> > > So no, sorry, not going to take this code at all.
> > 
> > And to be a bit more clear about this, having other subsystem
> > maintainers drop their unwanted code on this subsystem,_without_  even
> > asking me first is just not very nice. All of a sudden I am now > responsible for this stuff, without me even being asked about it.
> > Should I start throwing random drivers into the kvm subsystem for them
> > to maintain because I don't want to?:)
> 
> (I did see the smiley), I'm on board with throwing random drivers in
> arch/riscv. :)
> 
> The situation here didn't seem very far from what process/2.Process.rst says
> about staging:
> 
> - "a way to keep track of drivers that aren't up to standards", though in
> this case the issue is not coding standards or quality---the code is very
> good---and which people "may want to use"

Exactly, this is different.  And it's not self-contained, which is
another requirement for staging code that we have learned to enforce
over the years (makes it easier to rip out if no one is willing to
maintain it.)

> - the code could be removed if there's no progress on either changing the
> RISC-V acceptance policy or ratifying the spec

I really do not understand the issue here, why can this just not be
merged normally?

Is the code somehow not ok?  Is it relying on hardware in ways that
breaks other users?  Does it cause problems for different users?  Is it
a user api that you don't like or think is the "proper" one?

All staging drivers need a TODO list that shows what needs to be done in
order to get it out of staging.  All I can tell so far is that the riscv
maintainers do not want to take this for "unknown reasons" so let's dump
it over here for now where we don't have to see it.

And that's not good for developers or users, so perhaps the riscv rules
are not very good?

> Of course there should have been a TODO file explaining the situation. But
> if you think this is not the right place, I totally understand; if my
> opinion had any weight in this, I would just place it in arch/riscv/kvm.
> 
> The RISC-V acceptance policy as is just doesn't work, and the fact that
> people are trying to work around it is proving it.  There are many ways to
> improve it:

What is this magical acceptance policy that is preventing working code
from being merged?  And why is it suddenly the rest of the kernel
developer's problems because of this?

thanks,

greg k-h
Paolo Bonzini May 19, 2021, 1:29 p.m. UTC | #7
On 19/05/21 14:23, Greg Kroah-Hartman wrote:
>> - the code could be removed if there's no progress on either changing the
>> RISC-V acceptance policy or ratifying the spec
> 
> I really do not understand the issue here, why can this just not be
> merged normally?

Because the RISC-V people only want to merge code for "frozen" or 
"ratified" processor extensions, and the RISC-V foundation is dragging 
their feet in ratifying the hypervisor extension.

It's totally a self-inflicted pain on part of the RISC-V maintainers; 
see Documentation/riscv/patch-acceptance.rst:

   We'll only accept patches for new modules or extensions if the
   specifications for those modules or extensions are listed as being
   "Frozen" or "Ratified" by the RISC-V Foundation.  (Developers may, of
   course, maintain their own Linux kernel trees that contain code for
   any draft extensions that they wish.)

(Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/riscv/patch-acceptance.rst)

> All staging drivers need a TODO list that shows what needs to be done in
> order to get it out of staging.  All I can tell so far is that the riscv
> maintainers do not want to take this for "unknown reasons" so let's dump
> it over here for now where we don't have to see it.
> 
> And that's not good for developers or users, so perhaps the riscv rules
> are not very good?

I agree wholeheartedly.

I have heard contrasting opinions on conflict of interest where the 
employers of the maintainers benefit from slowing down the integration 
of code in Linus's tree.  I find these allegations believable, but even 
if that weren't the case, the policy is (to put it kindly) showing its 
limits.

>> Of course there should have been a TODO file explaining the situation. But
>> if you think this is not the right place, I totally understand; if my
>> opinion had any weight in this, I would just place it in arch/riscv/kvm.
>>
>> The RISC-V acceptance policy as is just doesn't work, and the fact that
>> people are trying to work around it is proving it.  There are many ways to
>> improve it:
> 
> What is this magical acceptance policy that is preventing working code
> from being merged?  And why is it suddenly the rest of the kernel
> developer's problems because of this?

It is my problem because I am trying to help Anup merging some perfectly 
good KVM code; when a new KVM port comes up, I coordinate merging the 
first arch/*/kvm bits with the arch/ maintainers and from that point on 
that directory becomes "mine" (or my submaintainers').

Paolo
Greg Kroah-Hartman May 19, 2021, 1:58 p.m. UTC | #8
On Wed, May 19, 2021 at 03:29:24PM +0200, Paolo Bonzini wrote:
> On 19/05/21 14:23, Greg Kroah-Hartman wrote:
> > > - the code could be removed if there's no progress on either changing the
> > > RISC-V acceptance policy or ratifying the spec
> > 
> > I really do not understand the issue here, why can this just not be
> > merged normally?
> 
> Because the RISC-V people only want to merge code for "frozen" or "ratified"
> processor extensions, and the RISC-V foundation is dragging their feet in
> ratifying the hypervisor extension.
> 
> It's totally a self-inflicted pain on part of the RISC-V maintainers; see
> Documentation/riscv/patch-acceptance.rst:
> 
>   We'll only accept patches for new modules or extensions if the
>   specifications for those modules or extensions are listed as being
>   "Frozen" or "Ratified" by the RISC-V Foundation.  (Developers may, of
>   course, maintain their own Linux kernel trees that contain code for
>   any draft extensions that they wish.)
> 
> (Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/riscv/patch-acceptance.rst)

Lovely, and how is that going to work for code that lives outside of the
riscv "arch" layer?  Like all drivers?

And what exactly is "not ratified" that these patches take advantage of?
If there is hardware out there with these features, well, Linux needs to
run on it, so we need to support that.  No external committee rules
should be relevant here.

Now if this is for hardware that is not "real", then that's a different
story.  In that case, who cares, no one can use it, so why not take it?

So what exactly is this trying to "protect" Linux from?

> > All staging drivers need a TODO list that shows what needs to be done in
> > order to get it out of staging.  All I can tell so far is that the riscv
> > maintainers do not want to take this for "unknown reasons" so let's dump
> > it over here for now where we don't have to see it.
> > 
> > And that's not good for developers or users, so perhaps the riscv rules
> > are not very good?
> 
> I agree wholeheartedly.
> 
> I have heard contrasting opinions on conflict of interest where the
> employers of the maintainers benefit from slowing down the integration of
> code in Linus's tree.  I find these allegations believable, but even if that
> weren't the case, the policy is (to put it kindly) showing its limits.

Slowing down code merges is horrible, again, if there's hardware out
there, and someone sends code to support it, and wants to maintain it,
then we should not be rejecting it.

Otherwise we are not doing our job as an operating system kernel, our
role is to make hardware work.  We don't get to just ignore code because
we don't like the hardware (oh if only we could!), if a user wants to
use it, our role is to handle that.

> > > Of course there should have been a TODO file explaining the situation. But
> > > if you think this is not the right place, I totally understand; if my
> > > opinion had any weight in this, I would just place it in arch/riscv/kvm.
> > > 
> > > The RISC-V acceptance policy as is just doesn't work, and the fact that
> > > people are trying to work around it is proving it.  There are many ways to
> > > improve it:
> > 
> > What is this magical acceptance policy that is preventing working code
> > from being merged?  And why is it suddenly the rest of the kernel
> > developer's problems because of this?
> 
> It is my problem because I am trying to help Anup merging some perfectly
> good KVM code; when a new KVM port comes up, I coordinate merging the first
> arch/*/kvm bits with the arch/ maintainers and from that point on that
> directory becomes "mine" (or my submaintainers').

Agreed, but the riscv maintainers should not be forcing this "problem"
onto all of us, like it seems is starting to happen :(

Ok, so, Paul, Palmer, and Albert, what do we do here?  Why can't we take
working code like this into the kernel if someone is willing to support
and maintain it over time?

thanks,

greg k-h
Dan Carpenter May 19, 2021, 3:08 p.m. UTC | #9
It's sort of frustrating that none of this information was in the commit
message.

"This code is not ready to be merged into the arch/risc/ directory
because the RISC-V Foundation has not certified the hardware spec yet.
However, the following chips have implemented it ABC12345, ABC6789 and
they've already shipping to thousands of customers since blah blah blah
so we should support it."

I honestly thought it was an issue with the code or the userspace API.

regards,
dan carpenter
Paolo Bonzini May 19, 2021, 3:26 p.m. UTC | #10
On 19/05/21 17:08, Dan Carpenter wrote:
> It's sort of frustrating that none of this information was in the commit
> message.
> 
> "This code is not ready to be merged into the arch/risc/ directory
> because the RISC-V Foundation has not certified the hardware spec yet.
> However, the following chips have implemented it ABC12345, ABC6789 and
> they've already shipping to thousands of customers since blah blah blah
> so we should support it."
> 
> I honestly thought it was an issue with the code or the userspace API.

Yes, I was expecting this to be in the staging TODO file - I should have 
been more explicit with Anup.

Paolo
Palmer Dabbelt May 21, 2021, 5:13 p.m. UTC | #11
On Wed, 19 May 2021 06:58:05 PDT (-0700), Greg KH wrote:
> On Wed, May 19, 2021 at 03:29:24PM +0200, Paolo Bonzini wrote:
>> On 19/05/21 14:23, Greg Kroah-Hartman wrote:
>> > > - the code could be removed if there's no progress on either changing the
>> > > RISC-V acceptance policy or ratifying the spec
>> >
>> > I really do not understand the issue here, why can this just not be
>> > merged normally?
>>
>> Because the RISC-V people only want to merge code for "frozen" or "ratified"
>> processor extensions, and the RISC-V foundation is dragging their feet in
>> ratifying the hypervisor extension.
>>
>> It's totally a self-inflicted pain on part of the RISC-V maintainers; see
>> Documentation/riscv/patch-acceptance.rst:
>>
>>   We'll only accept patches for new modules or extensions if the
>>   specifications for those modules or extensions are listed as being
>>   "Frozen" or "Ratified" by the RISC-V Foundation.  (Developers may, of
>>   course, maintain their own Linux kernel trees that contain code for
>>   any draft extensions that they wish.)
>>
>> (Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/riscv/patch-acceptance.rst)
>
> Lovely, and how is that going to work for code that lives outside of the
> riscv "arch" layer?  Like all drivers?
>
> And what exactly is "not ratified" that these patches take advantage of?
> If there is hardware out there with these features, well, Linux needs to
> run on it, so we need to support that.  No external committee rules
> should be relevant here.
>
> Now if this is for hardware that is not "real", then that's a different
> story.  In that case, who cares, no one can use it, so why not take it?
>
> So what exactly is this trying to "protect" Linux from?
>
>> > All staging drivers need a TODO list that shows what needs to be done in
>> > order to get it out of staging.  All I can tell so far is that the riscv
>> > maintainers do not want to take this for "unknown reasons" so let's dump
>> > it over here for now where we don't have to see it.
>> >
>> > And that's not good for developers or users, so perhaps the riscv rules
>> > are not very good?
>>
>> I agree wholeheartedly.
>>
>> I have heard contrasting opinions on conflict of interest where the
>> employers of the maintainers benefit from slowing down the integration of
>> code in Linus's tree.  I find these allegations believable, but even if that
>> weren't the case, the policy is (to put it kindly) showing its limits.
>
> Slowing down code merges is horrible, again, if there's hardware out
> there, and someone sends code to support it, and wants to maintain it,
> then we should not be rejecting it.
>
> Otherwise we are not doing our job as an operating system kernel, our
> role is to make hardware work.  We don't get to just ignore code because
> we don't like the hardware (oh if only we could!), if a user wants to
> use it, our role is to handle that.
>
>> > > Of course there should have been a TODO file explaining the situation. But
>> > > if you think this is not the right place, I totally understand; if my
>> > > opinion had any weight in this, I would just place it in arch/riscv/kvm.
>> > >
>> > > The RISC-V acceptance policy as is just doesn't work, and the fact that
>> > > people are trying to work around it is proving it.  There are many ways to
>> > > improve it:
>> >
>> > What is this magical acceptance policy that is preventing working code
>> > from being merged?  And why is it suddenly the rest of the kernel
>> > developer's problems because of this?
>>
>> It is my problem because I am trying to help Anup merging some perfectly
>> good KVM code; when a new KVM port comes up, I coordinate merging the first
>> arch/*/kvm bits with the arch/ maintainers and from that point on that
>> directory becomes "mine" (or my submaintainers').
>
> Agreed, but the riscv maintainers should not be forcing this "problem"
> onto all of us, like it seems is starting to happen :(
>
> Ok, so, Paul, Palmer, and Albert, what do we do here?  Why can't we take
> working code like this into the kernel if someone is willing to support
> and maintain it over time?

I don't view this code as being in a state where it can be maintained, 
at least to the standards we generally set within the kernel.  The ISA 
extension in question is still subject to change, it says so right at 
the top of the H extension 
<https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>

    {\bf Warning! This draft specification may change before being
    accepted as standard by the RISC-V Foundation.}

That means we really can't rely on any of this to be compatible with 
what is eventually ratified and (hopefully, because this is really 
important stuff) widely implemented in hardware.  We've already had 
isuses with other specifications where drafts were propossed as being 
ready for implemnetation, software was ported, and the future drafts 
were later incompatible -- we had this years ago with the debug support, 
which was a huge headache to deal with, and we're running into it again 
with these v-0.7.1 chips coming out.  I don't want to get stuck in a 
spot where we're forced to either deal with some old draft extension 
forever or end up breaking users.

Ultimately the whole RISC-V thing is only going to work out if we can 
get to the point where vendors can agree on a shared ISA.  I understand 
that there's been a lot of frustration WRT the timelines on the H 
extension, it's been frustrating for me as well.  There are clearly 
issues with how the ISA development process is being run and while those 
are coming to a head in other areas (the V extension and non-coherent 
devices, for example) I really don't think that's the case here because 
as far as I know we don't actually have any real hardware that 
implements the H extension.

All I really care about is getting to the point where we have real 
RISC-V systems running software that's as close to upstream as is 
reasonable.  As it currently stands, I don't know of anything this is 
blocking: there's some RTL implementation floating around, but that's a 
very long way from being real hardware.  Something of this complexity 
isn't suitable for a soft core, and RTL alone doesn't fix the 
fundamental problem of having a stable platform to run on (it needs a 
complex FPGA environment, and even then it's very limited in 
functionality).  I'm not sure where exactly the line for real hardware 
is, but for something like this it would at least involve some chip that 
is widely availiable and needs the H extension to be useful.  Such a 
system existing without a ratified extension would obviously be a major 
failing on the specification side, and while I think that's happening 
now for some systems (some of these V-0.7.1 chips, and the non-coherent 
systems) I just don't see that as the case for the H extension.  We've 
got to get to the point where the ISA extensions can be ratified in a 
timely fashion, but circumventing that process by merging code early 
doesn't fix the problem.  This really needs to be fixed at the RISC-V 
foundation, not papered over in software.

We have lots of real RISC-V hardware right now that's going to require a 
huge amount of work to support, trying to chase around a draft extension 
that may not even end up in hardware is just going to make headaches we 
don't have the time for.
Paolo Bonzini May 21, 2021, 5:21 p.m. UTC | #12
On 21/05/21 19:13, Palmer Dabbelt wrote:
>> 
> 
> I don't view this code as being in a state where it can be
> maintained, at least to the standards we generally set within the
> kernel.  The ISA extension in question is still subject to change, it
> says so right at the top of the H extension 
> <https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>
> 
>   {\bf Warning! This draft specification may change before being 
>   accepted as standard by the RISC-V Foundation.}

To give a complete picture, the last three relevant changes have been in
August 2019, November 2019 and May 2020.  It seems pretty frozen to me.

In any case, I think it's clear from the experience with Android that
the acceptance policy cannot succeed.  The only thing that such a policy
guarantees, is that vendors will use more out-of-tree code.  Keeping a
fully-developed feature out-of-tree for years is not how Linux is run.

> I'm not sure where exactly the line for real hardware is, but for
> something like this it would at least involve some chip that is
> widely availiable and needs the H extension to be useful

Anup said that "quite a few people have already implemented RISC-V
H-extension in hardware as well and KVM RISC-V works on real HW as 
well".  Those people would benefit from having KVM in the Linus tree.

Paolo
Greg Kroah-Hartman May 21, 2021, 5:47 p.m. UTC | #13
On Fri, May 21, 2021 at 07:21:12PM +0200, Paolo Bonzini wrote:
> On 21/05/21 19:13, Palmer Dabbelt wrote:
> > > 
> > 
> > I don't view this code as being in a state where it can be
> > maintained, at least to the standards we generally set within the
> > kernel.  The ISA extension in question is still subject to change, it
> > says so right at the top of the H extension <https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>
> > 
> >   {\bf Warning! This draft specification may change before being
> > accepted as standard by the RISC-V Foundation.}
> 
> To give a complete picture, the last three relevant changes have been in
> August 2019, November 2019 and May 2020.  It seems pretty frozen to me.
> 
> In any case, I think it's clear from the experience with Android that
> the acceptance policy cannot succeed.  The only thing that such a policy
> guarantees, is that vendors will use more out-of-tree code.  Keeping a
> fully-developed feature out-of-tree for years is not how Linux is run.
> 
> > I'm not sure where exactly the line for real hardware is, but for
> > something like this it would at least involve some chip that is
> > widely availiable and needs the H extension to be useful
> 
> Anup said that "quite a few people have already implemented RISC-V
> H-extension in hardware as well and KVM RISC-V works on real HW as well".
> Those people would benefit from having KVM in the Linus tree.

Great, but is this really true?  If so, what hardware has this?  I have
a new RISC-V device right here next to me, what would I need to do to
see if this is supported in it or not?

If this isn't in any hardware that anyone outside of
internal-to-company-prototypes, then let's wait until it really is in a
device that people can test this code on.

What's the rush to get this merged now if no one can use it?

thanks,

greg k-h
Palmer Dabbelt May 21, 2021, 6:08 p.m. UTC | #14
On Fri, 21 May 2021 10:47:51 PDT (-0700), Greg KH wrote:
> On Fri, May 21, 2021 at 07:21:12PM +0200, Paolo Bonzini wrote:
>> On 21/05/21 19:13, Palmer Dabbelt wrote:
>> > >
>> >
>> > I don't view this code as being in a state where it can be
>> > maintained, at least to the standards we generally set within the
>> > kernel.  The ISA extension in question is still subject to change, it
>> > says so right at the top of the H extension <https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>
>> >
>> >   {\bf Warning! This draft specification may change before being
>> > accepted as standard by the RISC-V Foundation.}
>>
>> To give a complete picture, the last three relevant changes have been in
>> August 2019, November 2019 and May 2020.  It seems pretty frozen to me.
>>
>> In any case, I think it's clear from the experience with Android that
>> the acceptance policy cannot succeed.  The only thing that such a policy
>> guarantees, is that vendors will use more out-of-tree code.  Keeping a
>> fully-developed feature out-of-tree for years is not how Linux is run.
>>
>> > I'm not sure where exactly the line for real hardware is, but for
>> > something like this it would at least involve some chip that is
>> > widely availiable and needs the H extension to be useful
>>
>> Anup said that "quite a few people have already implemented RISC-V
>> H-extension in hardware as well and KVM RISC-V works on real HW as well".
>> Those people would benefit from having KVM in the Linus tree.
>
> Great, but is this really true?  If so, what hardware has this?  I have
> a new RISC-V device right here next to me, what would I need to do to
> see if this is supported in it or not?

You can probe the misa register, it should have the H bit set if it 
supports the H extension.

> If this isn't in any hardware that anyone outside of
> internal-to-company-prototypes, then let's wait until it really is in a
> device that people can test this code on.
>
> What's the rush to get this merged now if no one can use it?
>
> thanks,
>
> greg k-h
Greg Kroah-Hartman May 21, 2021, 6:25 p.m. UTC | #15
On Fri, May 21, 2021 at 11:08:15AM -0700, Palmer Dabbelt wrote:
> On Fri, 21 May 2021 10:47:51 PDT (-0700), Greg KH wrote:
> > On Fri, May 21, 2021 at 07:21:12PM +0200, Paolo Bonzini wrote:
> > > On 21/05/21 19:13, Palmer Dabbelt wrote:
> > > > >
> > > >
> > > > I don't view this code as being in a state where it can be
> > > > maintained, at least to the standards we generally set within the
> > > > kernel.  The ISA extension in question is still subject to change, it
> > > > says so right at the top of the H extension <https://github.com/riscv/riscv-isa-manual/blob/master/src/hypervisor.tex#L4>
> > > >
> > > >   {\bf Warning! This draft specification may change before being
> > > > accepted as standard by the RISC-V Foundation.}
> > > 
> > > To give a complete picture, the last three relevant changes have been in
> > > August 2019, November 2019 and May 2020.  It seems pretty frozen to me.
> > > 
> > > In any case, I think it's clear from the experience with Android that
> > > the acceptance policy cannot succeed.  The only thing that such a policy
> > > guarantees, is that vendors will use more out-of-tree code.  Keeping a
> > > fully-developed feature out-of-tree for years is not how Linux is run.
> > > 
> > > > I'm not sure where exactly the line for real hardware is, but for
> > > > something like this it would at least involve some chip that is
> > > > widely availiable and needs the H extension to be useful
> > > 
> > > Anup said that "quite a few people have already implemented RISC-V
> > > H-extension in hardware as well and KVM RISC-V works on real HW as well".
> > > Those people would benefit from having KVM in the Linus tree.
> > 
> > Great, but is this really true?  If so, what hardware has this?  I have
> > a new RISC-V device right here next to me, what would I need to do to
> > see if this is supported in it or not?
> 
> You can probe the misa register, it should have the H bit set if it supports
> the H extension.

To let everyone know, based on our private chat we had off-list, no, the
device I have does not support this extension, so unless someone can
point me at real hardware, I don't think this code needs to be
considered for merging anywhere just yet.

thanks,

greg k-h
Paolo Bonzini May 21, 2021, 8:25 p.m. UTC | #16
On 21/05/21 19:47, Greg KH wrote:
> If this isn't in any hardware that anyone outside of
> internal-to-company-prototypes, then let's wait until it really is in a
> device that people can test this code on.
> 
> What's the rush to get this merged now if no one can use it?

There is not just hardware, there are simulators and emulators too (you 
can use QEMU to test it for example), and it's not exactly a rush since 
it's basically been ready for 2 years and has hardly seen any code 
changes since v13 which was based on Linux 5.9.

Not having the code upstream is hindering further development so that 
RISC-V KVM can be feature complete when hardware does come out.  Missing 
features and optimizations could be added on top, but they are harder to 
review if they are integrated in a relatively large series instead of 
being done incrementally.  Not having the header files in Linus's tree 
makes it harder to merge RISC-V KVM support in userspace (userspace is 
shielded anyway by any future changes to the hypervisor specification, 
so there's no risk of breaking the ABI).

At some point one has to say enough is enough; for me, that is after one 
year with no changes to the spec and, especially, no deadline in sight 
for freezing it.  The last 5 versions of the patch set were just 
adapting to changes in the generic KVM code.  If the code is good, I 
don't see why the onus of doing those changes should be on Anup, rather 
than being shared amongst all KVM developers as is the case for all the 
other architectures.

Paolo
Guo Ren May 24, 2021, 7:09 a.m. UTC | #17
Thx Anup,

Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)

I'm following your KVM patchset and it's a great job for riscv
H-extension. I think hardware companies hope Linux KVM ready first
before the real chip. That means we can ensure the hardware could run
mainline linux.

Good luck!

On Wed, May 19, 2021 at 11:36 AM Anup Patel <anup.patel@wdc.com> wrote:
>
> From: Anup Patel <anup@brainfault.org>
>
> This series adds initial KVM RISC-V support. Currently, we are able to boot
> Linux on RV64/RV32 Guest with multiple VCPUs.
>
> Key aspects of KVM RISC-V added by this series are:
> 1. No RISC-V specific KVM IOCTL
> 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> 3. Both RV64 and RV32 host supported
> 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> 5. KVM ONE_REG interface for VCPU register access from user-space
> 6. PLIC emulation is done in user-space
> 7. Timer and IPI emuation is done in-kernel
> 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> 9. MMU notifiers supported
> 10. Generic dirtylog supported
> 11. FP lazy save/restore supported
> 12. SBI v0.1 emulation for KVM Guest available
> 13. Forward unhandled SBI calls to KVM userspace
> 14. Hugepage support for Guest/VM
> 15. IOEVENTFD support for Vhost
>
> Here's a brief TODO list which we will work upon after this series:
> 1. SBI v0.2 emulation in-kernel
> 2. SBI v0.2 hart state management emulation in-kernel
> 3. In-kernel PLIC emulation
> 4. ..... and more .....
>
> This series can be found in riscv_kvm_v18 branch at:
> https//github.com/avpatel/linux.git
>
> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
> at: https//github.com/avpatel/kvmtool.git
>
> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> in master branch at: https://git.qemu.org/git/qemu.git
>
> To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> https://github.com/kvm-riscv/howto/wiki
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
>
> Changes since v17:
>  - Rebased on Linux-5.13-rc2
>  - Moved to new KVM MMU notifier APIs
>  - Removed redundant kvm_arch_vcpu_uninit()
>  - Moved KVM RISC-V sources to drivers/staging for compliance with
>    Linux RISC-V patch acceptance policy
>
> Changes since v16:
>  - Rebased on Linux-5.12-rc5
>  - Remove redundant kvm_arch_create_memslot(), kvm_arch_vcpu_setup(),
>    kvm_arch_vcpu_init(), kvm_arch_has_vcpu_debugfs(), and
>    kvm_arch_create_vcpu_debugfs() from PATCH5
>  - Make stage2_wp_memory_region() and stage2_ioremap() as static
>    in PATCH13
>
> Changes since v15:
>  - Rebased on Linux-5.11-rc3
>  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
>    writeability of a host pfn.
>  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
>    uapi/asm/kvm.h
>
> Changes since v14:
>  - Rebased on Linux-5.10-rc3
>  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
>
> Changes since v13:
>  - Rebased on Linux-5.9-rc3
>  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
>  - Fixed instruction length computation in PATCH7
>  - Added ioeventfd support in PATCH7
>  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
>    intructions in PATCH7
>  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
>    in PATCH10
>  - Added stage2 dirty page logging in PATCH10
>  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
>  - Save/restore SCOUNTEREN in PATCH6
>  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
>    using CSR swap instruction in PATCH6
>  - Detect and use Sv48x4 when available in PATCH10
>
> Changes since v12:
>  - Rebased patches on Linux-5.8-rc4
>  - By default enable all counters in HCOUNTEREN
>  - RISC-V H-Extension v0.6.1 spec support
>
> Changes since v11:
>  - Rebased patches on Linux-5.7-rc3
>  - Fixed typo in typecast of stage2_map_size define
>  - Introduced struct kvm_cpu_trap to represent trap details and
>    use it as function parameter wherever applicable
>  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
>    logging in future
>  - RISC-V H-Extension v0.6 spec support
>  - Send-out first three patches as separate series so that it can
>    be taken by Palmer for Linux RISC-V
>
> Changes since v10:
>  - Rebased patches on Linux-5.6-rc5
>  - Reduce RISCV_ISA_EXT_MAX from 256 to 64
>  - Separate PATCH for removing N-extension related defines
>  - Added comments as requested by Palmer
>  - Fixed HIDELEG CSR programming
>
> Changes since v9:
>  - Rebased patches on Linux-5.5-rc3
>  - Squash PATCH19 and PATCH20 into PATCH5
>  - Squash PATCH18 into PATCH11
>  - Squash PATCH17 into PATCH16
>  - Added ONE_REG interface for VCPU timer in PATCH13
>  - Use HTIMEDELTA for VCPU timer in PATCH13
>  - Updated KVM RISC-V mailing list in MAINTAINERS entry
>  - Update KVM kconfig option to depend on RISCV_SBI and MMU
>  - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
>  - Use SBI v0.2 RFENCE extension in VMID implementation
>  - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
>  - Use SBI v0.2 RFENCE extension in SBI implementation
>  - Moved to RISC-V Hypervisor v0.5 draft spec
>  - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface
>
> Changes since v8:
>  - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
>  - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
>  - Fixed kvm_riscv_stage2_map() to handle hugepages
>  - Added patch to forward unhandled SBI calls to user-space
>  - Added patch for iterative/recursive stage2 page table programming
>  - Added patch to remove per-CPU vsip_shadow variable
>  - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()
>
> Changes since v7:
>  - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
>  - Removed PATCH1, PATCH3, and PATCH20 because these already merged
>  - Use kernel doc style comments for ISA bitmap functions
>  - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
>    be added in-future
>  - Mark KVM RISC-V kconfig option as EXPERIMENTAL
>  - Typo fix in commit description of PATCH6 of v7 series
>  - Use separate structs for CORE and CSR registers of ONE_REG interface
>  - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
>  - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
>  - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
>  - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
>  - Added defines for checking/decoding instruction length
>  - Added separate patch to forward unhandled SBI calls to userspace tool
>
> Changes since v6:
>  - Rebased patches on Linux-5.3-rc7
>  - Added "return_handled" in struct kvm_mmio_decode to ensure that
>    kvm_riscv_vcpu_mmio_return() updates SEPC only once
>  - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
>  - Updated git repo URL in MAINTAINERS entry
>
> Changes since v5:
>  - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
>    KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
>  - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
>  - Use switch case instead of illegal instruction opcode table for simplicity
>  - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
>   flush optimization
>  - Handle all unsupported SBI calls in default case of
>    kvm_riscv_vcpu_sbi_ecall() function
>  - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
>  - Improved unprivilege reads to handle traps due to Guest stage1 page table
>  - Added separate patch to document RISC-V specific things in
>    Documentation/virt/kvm/api.txt
>
> Changes since v4:
>  - Rebased patches on Linux-5.3-rc5
>  - Added Paolo's Acked-by and Reviewed-by
>  - Updated mailing list in MAINTAINERS entry
>
> Changes since v3:
>  - Moved patch for ISA bitmap from KVM prep series to this series
>  - Make vsip_shadow as run-time percpu variable instead of compile-time
>  - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs
>
> Changes since v2:
>  - Removed references of KVM_REQ_IRQ_PENDING from all patches
>  - Use kvm->srcu within in-kernel KVM run loop
>  - Added percpu vsip_shadow to track last value programmed in VSIP CSR
>  - Added comments about irqs_pending and irqs_pending_mask
>  - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
>    in system_opcode_insn()
>  - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
>  - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
>  - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid
>
> Changes since v1:
>  - Fixed compile errors in building KVM RISC-V as module
>  - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
>  - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
>  - Made vmid_version as unsigned long instead of atomic
>  - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
>  - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
>  - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
>  - Updated ONE_REG interface for CSR access to user-space
>  - Removed irqs_pending_lock and use atomic bitops instead
>  - Added separate patch for FP ONE_REG interface
>  - Added separate patch for updating MAINTAINERS file
>
> Anup Patel (14):
>   RISC-V: Add hypervisor extension related CSR defines
>   RISC-V: Add initial skeletal KVM support
>   RISC-V: KVM: Implement VCPU create, init and destroy functions
>   RISC-V: KVM: Implement VCPU interrupts and requests handling
>   RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
>   RISC-V: KVM: Implement VCPU world-switch
>   RISC-V: KVM: Handle MMIO exits for VCPU
>   RISC-V: KVM: Handle WFI exits for VCPU
>   RISC-V: KVM: Implement VMID allocator
>   RISC-V: KVM: Implement stage2 page table programming
>   RISC-V: KVM: Implement MMU notifiers
>   RISC-V: KVM: Document RISC-V specific parts of KVM API
>   RISC-V: KVM: Move sources to drivers/staging directory
>   RISC-V: KVM: Add MAINTAINERS entry
>
> Atish Patra (4):
>   RISC-V: KVM: Add timer functionality
>   RISC-V: KVM: FP lazy save/restore
>   RISC-V: KVM: Implement ONE REG interface for FP registers
>   RISC-V: KVM: Add SBI v0.1 support
>
>  Documentation/virt/kvm/api.rst                | 193 +++-
>  MAINTAINERS                                   |  11 +
>  arch/riscv/Kconfig                            |   1 +
>  arch/riscv/Makefile                           |   1 +
>  arch/riscv/include/uapi/asm/kvm.h             | 128 +++
>  drivers/clocksource/timer-riscv.c             |   9 +
>  drivers/staging/riscv/kvm/Kconfig             |  36 +
>  drivers/staging/riscv/kvm/Makefile            |  23 +
>  drivers/staging/riscv/kvm/asm/kvm_csr.h       | 105 ++
>  drivers/staging/riscv/kvm/asm/kvm_host.h      | 271 +++++
>  drivers/staging/riscv/kvm/asm/kvm_types.h     |   7 +
>  .../staging/riscv/kvm/asm/kvm_vcpu_timer.h    |  44 +
>  drivers/staging/riscv/kvm/main.c              | 118 +++
>  drivers/staging/riscv/kvm/mmu.c               | 802 ++++++++++++++
>  drivers/staging/riscv/kvm/riscv_offsets.c     | 170 +++
>  drivers/staging/riscv/kvm/tlb.S               |  74 ++
>  drivers/staging/riscv/kvm/vcpu.c              | 997 ++++++++++++++++++
>  drivers/staging/riscv/kvm/vcpu_exit.c         | 701 ++++++++++++
>  drivers/staging/riscv/kvm/vcpu_sbi.c          | 173 +++
>  drivers/staging/riscv/kvm/vcpu_switch.S       | 401 +++++++
>  drivers/staging/riscv/kvm/vcpu_timer.c        | 225 ++++
>  drivers/staging/riscv/kvm/vm.c                |  81 ++
>  drivers/staging/riscv/kvm/vmid.c              | 120 +++
>  include/clocksource/timer-riscv.h             |  16 +
>  include/uapi/linux/kvm.h                      |   8 +
>  25 files changed, 4706 insertions(+), 9 deletions(-)
>  create mode 100644 arch/riscv/include/uapi/asm/kvm.h
>  create mode 100644 drivers/staging/riscv/kvm/Kconfig
>  create mode 100644 drivers/staging/riscv/kvm/Makefile
>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_csr.h
>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_host.h
>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_types.h
>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_vcpu_timer.h
>  create mode 100644 drivers/staging/riscv/kvm/main.c
>  create mode 100644 drivers/staging/riscv/kvm/mmu.c
>  create mode 100644 drivers/staging/riscv/kvm/riscv_offsets.c
>  create mode 100644 drivers/staging/riscv/kvm/tlb.S
>  create mode 100644 drivers/staging/riscv/kvm/vcpu.c
>  create mode 100644 drivers/staging/riscv/kvm/vcpu_exit.c
>  create mode 100644 drivers/staging/riscv/kvm/vcpu_sbi.c
>  create mode 100644 drivers/staging/riscv/kvm/vcpu_switch.S
>  create mode 100644 drivers/staging/riscv/kvm/vcpu_timer.c
>  create mode 100644 drivers/staging/riscv/kvm/vm.c
>  create mode 100644 drivers/staging/riscv/kvm/vmid.c
>  create mode 100644 include/clocksource/timer-riscv.h
>
> --
> 2.25.1
>
Palmer Dabbelt May 24, 2021, 10:57 p.m. UTC | #18
On Mon, 24 May 2021 00:09:45 PDT (-0700), guoren@kernel.org wrote:
> Thx Anup,
>
> Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)
>
> I'm following your KVM patchset and it's a great job for riscv
> H-extension. I think hardware companies hope Linux KVM ready first
> before the real chip. That means we can ensure the hardware could run
> mainline linux.

I understand that it would be wonderful for hardware vendors to have a 
guarantee that their hardware will be supported by the software 
ecosystem, but that's not what we're talking about here.  Specifically, 
the proposal for this code is to track the latest draft extension which 
would specifically leave vendors who implement the current draft out in 
the cold was something to change.  In practice that is the only way to 
move forward with any draft extension that doesn't have hardware 
available, as the software RISC-V implementations rapidly deprecate 
draft extensions and without a way to test our code it is destined to 
bit rot.

If vendors want to make sure their hardware is supported then the best 
way to do that is to make sure specifications get ratified in a timely 
fashion that describe the behavior required from their products.  That 
way we have an agreed upon interface that vendors can implement and 
software can rely on.  I understand that a lot of people are frustrated 
with the pace of that process when it comes to the H extension, but 
circumventing that process doesn't fix the fundamental problem.  If 
there really are products out there that people can't build because the 
H extension isn't upstream then we need to have a serious discussion 
about those, but without something specific to discuss this is just 
going to devolve into speculation which isn't a good use of time.

I can't find any hardware that implements the draft H extension, at 
least via poking around on Google and in my email.  I'm very hesitant to 
talk about private vendor roadmaps in public, as that's getting way too 
close to my day job, but I've yet to have a vendor raise this as an 
issue to me privately and I do try my best to make sure to talk to the 
RISC-V hardware vendors whenever possible (though I try to stick to 
public roadmaps there, to avoid issues around discussions like this and 
conflicts with work).  Anup is clearly in a much more privileged 
position than I am here, given that he has real hardware and is able to 
allude to vendor roadmaps that I can't find in public, but until we can 
all get on the same page about that it's going to be difficult to have a 
reasonable discussion -- if we all have different information we're 
naturally going to arrive at different conclusions, which IMO is why 
this argument just keeps coming up.  It's totally possible I'm just 
missing something here, in which case I'd love to be corrected as we can 
be having a very different discussion.

I certainly hope that vendors understand that we're willing to work with 
them when it comes to making the software run on their hardware.  I've 
always tried to be quite explicit that's our goal here, both by just 
saying so and by demonstrating that we're willing to take code that 
exhibits behavior not specified by the specifications but that is 
necessary to make real hardware work.  There's always a balance here and 
I can't commit to making every chip with a RISC-V logo on it run Linux 
well, as there will always be some implementations that are just 
impossible to run real code on, but I'm always willing to do whatever I 
can to try to make things work.

If anyone has concrete concerns about RISC-V hardware not being 
supported by Linux then I'm happy to have a discussion about that.  
Having a discussion in public is always best, as then everyone can be on 
the same page, but as far as I know we're doing a good job supporting 
the publicly available hardware -- not saying we're perfect, but given 
the size of the RISC-V user base and how new much of the hardware is I 
think we're well above average when it comes to upstream support of real 
hardware.  I have a feeling anyone's worries would be about unreleased 
hardware, in which case I can understand it's difficult to have concrete 
discussions in public.  I'm always happy to at least make an attempt to 
have private discussions about these (private discussion are tricky, 
though, so I can't promise I can always participate), and while I don't 
think those discussions should meaningfully sway the kernel's policies 
one way or the other it could at least help alleviate any acute concerns 
that vendors have.  We've gotten to the point where some pretty serious 
accusations are starting to get thrown around, and that sort of thing 
really doesn't benefit anyone so I'm willing to do whatever I can to 
help fix that.

IMO we're just trying to follow the standard Linux development policies 
here, where the focus is on making real hardware work in a way that can 
be sustainably maintained so we don't break users.  If anything I think 
we're a notch more liberal WRT the code we accept than standard with the 
current policy, as accepting anything in a frozen extension doesn't even 
require a commitment from a hardware vendor WRT implementing the code.  
That obviously opens us up to behavior differences between the hardware 
and the specification, which have historically been retrofitted back to 
the specifications, but I'm willing to take on the extra work as it 
helps lend weight to the specification development process.

If I'm just missing something here and there is publicly available 
hardware that implements the H extension then I'd be happy to have that 
pointed out and very much change the tune of this discussion, but until 
hardware shows up or the ISA is frozen I just don't see any way to 
maintain this code up the standards generally set by Linux or 
specifically by arch/riscv and therefor cannot support merging it.

>
> Good luck!
>
> On Wed, May 19, 2021 at 11:36 AM Anup Patel <anup.patel@wdc.com> wrote:
>>
>> From: Anup Patel <anup@brainfault.org>
>>
>> This series adds initial KVM RISC-V support. Currently, we are able to boot
>> Linux on RV64/RV32 Guest with multiple VCPUs.
>>
>> Key aspects of KVM RISC-V added by this series are:
>> 1. No RISC-V specific KVM IOCTL
>> 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
>> 3. Both RV64 and RV32 host supported
>> 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
>> 5. KVM ONE_REG interface for VCPU register access from user-space
>> 6. PLIC emulation is done in user-space
>> 7. Timer and IPI emuation is done in-kernel
>> 8. Both Sv39x4 and Sv48x4 supported for RV64 host
>> 9. MMU notifiers supported
>> 10. Generic dirtylog supported
>> 11. FP lazy save/restore supported
>> 12. SBI v0.1 emulation for KVM Guest available
>> 13. Forward unhandled SBI calls to KVM userspace
>> 14. Hugepage support for Guest/VM
>> 15. IOEVENTFD support for Vhost
>>
>> Here's a brief TODO list which we will work upon after this series:
>> 1. SBI v0.2 emulation in-kernel
>> 2. SBI v0.2 hart state management emulation in-kernel
>> 3. In-kernel PLIC emulation
>> 4. ..... and more .....
>>
>> This series can be found in riscv_kvm_v18 branch at:
>> https//github.com/avpatel/linux.git
>>
>> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
>> at: https//github.com/avpatel/kvmtool.git
>>
>> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
>> in master branch at: https://git.qemu.org/git/qemu.git
>>
>> To play around with KVM RISC-V, refer KVM RISC-V wiki at:
>> https://github.com/kvm-riscv/howto/wiki
>> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
>> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
>>
>> Changes since v17:
>>  - Rebased on Linux-5.13-rc2
>>  - Moved to new KVM MMU notifier APIs
>>  - Removed redundant kvm_arch_vcpu_uninit()
>>  - Moved KVM RISC-V sources to drivers/staging for compliance with
>>    Linux RISC-V patch acceptance policy
>>
>> Changes since v16:
>>  - Rebased on Linux-5.12-rc5
>>  - Remove redundant kvm_arch_create_memslot(), kvm_arch_vcpu_setup(),
>>    kvm_arch_vcpu_init(), kvm_arch_has_vcpu_debugfs(), and
>>    kvm_arch_create_vcpu_debugfs() from PATCH5
>>  - Make stage2_wp_memory_region() and stage2_ioremap() as static
>>    in PATCH13
>>
>> Changes since v15:
>>  - Rebased on Linux-5.11-rc3
>>  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
>>    writeability of a host pfn.
>>  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
>>    uapi/asm/kvm.h
>>
>> Changes since v14:
>>  - Rebased on Linux-5.10-rc3
>>  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
>>
>> Changes since v13:
>>  - Rebased on Linux-5.9-rc3
>>  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
>>  - Fixed instruction length computation in PATCH7
>>  - Added ioeventfd support in PATCH7
>>  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
>>    intructions in PATCH7
>>  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
>>    in PATCH10
>>  - Added stage2 dirty page logging in PATCH10
>>  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
>>  - Save/restore SCOUNTEREN in PATCH6
>>  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
>>    using CSR swap instruction in PATCH6
>>  - Detect and use Sv48x4 when available in PATCH10
>>
>> Changes since v12:
>>  - Rebased patches on Linux-5.8-rc4
>>  - By default enable all counters in HCOUNTEREN
>>  - RISC-V H-Extension v0.6.1 spec support
>>
>> Changes since v11:
>>  - Rebased patches on Linux-5.7-rc3
>>  - Fixed typo in typecast of stage2_map_size define
>>  - Introduced struct kvm_cpu_trap to represent trap details and
>>    use it as function parameter wherever applicable
>>  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
>>    logging in future
>>  - RISC-V H-Extension v0.6 spec support
>>  - Send-out first three patches as separate series so that it can
>>    be taken by Palmer for Linux RISC-V
>>
>> Changes since v10:
>>  - Rebased patches on Linux-5.6-rc5
>>  - Reduce RISCV_ISA_EXT_MAX from 256 to 64
>>  - Separate PATCH for removing N-extension related defines
>>  - Added comments as requested by Palmer
>>  - Fixed HIDELEG CSR programming
>>
>> Changes since v9:
>>  - Rebased patches on Linux-5.5-rc3
>>  - Squash PATCH19 and PATCH20 into PATCH5
>>  - Squash PATCH18 into PATCH11
>>  - Squash PATCH17 into PATCH16
>>  - Added ONE_REG interface for VCPU timer in PATCH13
>>  - Use HTIMEDELTA for VCPU timer in PATCH13
>>  - Updated KVM RISC-V mailing list in MAINTAINERS entry
>>  - Update KVM kconfig option to depend on RISCV_SBI and MMU
>>  - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
>>  - Use SBI v0.2 RFENCE extension in VMID implementation
>>  - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
>>  - Use SBI v0.2 RFENCE extension in SBI implementation
>>  - Moved to RISC-V Hypervisor v0.5 draft spec
>>  - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface
>>
>> Changes since v8:
>>  - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
>>  - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
>>  - Fixed kvm_riscv_stage2_map() to handle hugepages
>>  - Added patch to forward unhandled SBI calls to user-space
>>  - Added patch for iterative/recursive stage2 page table programming
>>  - Added patch to remove per-CPU vsip_shadow variable
>>  - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()
>>
>> Changes since v7:
>>  - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
>>  - Removed PATCH1, PATCH3, and PATCH20 because these already merged
>>  - Use kernel doc style comments for ISA bitmap functions
>>  - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
>>    be added in-future
>>  - Mark KVM RISC-V kconfig option as EXPERIMENTAL
>>  - Typo fix in commit description of PATCH6 of v7 series
>>  - Use separate structs for CORE and CSR registers of ONE_REG interface
>>  - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
>>  - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
>>  - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
>>  - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
>>  - Added defines for checking/decoding instruction length
>>  - Added separate patch to forward unhandled SBI calls to userspace tool
>>
>> Changes since v6:
>>  - Rebased patches on Linux-5.3-rc7
>>  - Added "return_handled" in struct kvm_mmio_decode to ensure that
>>    kvm_riscv_vcpu_mmio_return() updates SEPC only once
>>  - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
>>  - Updated git repo URL in MAINTAINERS entry
>>
>> Changes since v5:
>>  - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
>>    KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
>>  - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
>>  - Use switch case instead of illegal instruction opcode table for simplicity
>>  - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
>>   flush optimization
>>  - Handle all unsupported SBI calls in default case of
>>    kvm_riscv_vcpu_sbi_ecall() function
>>  - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
>>  - Improved unprivilege reads to handle traps due to Guest stage1 page table
>>  - Added separate patch to document RISC-V specific things in
>>    Documentation/virt/kvm/api.txt
>>
>> Changes since v4:
>>  - Rebased patches on Linux-5.3-rc5
>>  - Added Paolo's Acked-by and Reviewed-by
>>  - Updated mailing list in MAINTAINERS entry
>>
>> Changes since v3:
>>  - Moved patch for ISA bitmap from KVM prep series to this series
>>  - Make vsip_shadow as run-time percpu variable instead of compile-time
>>  - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs
>>
>> Changes since v2:
>>  - Removed references of KVM_REQ_IRQ_PENDING from all patches
>>  - Use kvm->srcu within in-kernel KVM run loop
>>  - Added percpu vsip_shadow to track last value programmed in VSIP CSR
>>  - Added comments about irqs_pending and irqs_pending_mask
>>  - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
>>    in system_opcode_insn()
>>  - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
>>  - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
>>  - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid
>>
>> Changes since v1:
>>  - Fixed compile errors in building KVM RISC-V as module
>>  - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
>>  - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
>>  - Made vmid_version as unsigned long instead of atomic
>>  - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
>>  - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
>>  - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
>>  - Updated ONE_REG interface for CSR access to user-space
>>  - Removed irqs_pending_lock and use atomic bitops instead
>>  - Added separate patch for FP ONE_REG interface
>>  - Added separate patch for updating MAINTAINERS file
>>
>> Anup Patel (14):
>>   RISC-V: Add hypervisor extension related CSR defines
>>   RISC-V: Add initial skeletal KVM support
>>   RISC-V: KVM: Implement VCPU create, init and destroy functions
>>   RISC-V: KVM: Implement VCPU interrupts and requests handling
>>   RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
>>   RISC-V: KVM: Implement VCPU world-switch
>>   RISC-V: KVM: Handle MMIO exits for VCPU
>>   RISC-V: KVM: Handle WFI exits for VCPU
>>   RISC-V: KVM: Implement VMID allocator
>>   RISC-V: KVM: Implement stage2 page table programming
>>   RISC-V: KVM: Implement MMU notifiers
>>   RISC-V: KVM: Document RISC-V specific parts of KVM API
>>   RISC-V: KVM: Move sources to drivers/staging directory
>>   RISC-V: KVM: Add MAINTAINERS entry
>>
>> Atish Patra (4):
>>   RISC-V: KVM: Add timer functionality
>>   RISC-V: KVM: FP lazy save/restore
>>   RISC-V: KVM: Implement ONE REG interface for FP registers
>>   RISC-V: KVM: Add SBI v0.1 support
>>
>>  Documentation/virt/kvm/api.rst                | 193 +++-
>>  MAINTAINERS                                   |  11 +
>>  arch/riscv/Kconfig                            |   1 +
>>  arch/riscv/Makefile                           |   1 +
>>  arch/riscv/include/uapi/asm/kvm.h             | 128 +++
>>  drivers/clocksource/timer-riscv.c             |   9 +
>>  drivers/staging/riscv/kvm/Kconfig             |  36 +
>>  drivers/staging/riscv/kvm/Makefile            |  23 +
>>  drivers/staging/riscv/kvm/asm/kvm_csr.h       | 105 ++
>>  drivers/staging/riscv/kvm/asm/kvm_host.h      | 271 +++++
>>  drivers/staging/riscv/kvm/asm/kvm_types.h     |   7 +
>>  .../staging/riscv/kvm/asm/kvm_vcpu_timer.h    |  44 +
>>  drivers/staging/riscv/kvm/main.c              | 118 +++
>>  drivers/staging/riscv/kvm/mmu.c               | 802 ++++++++++++++
>>  drivers/staging/riscv/kvm/riscv_offsets.c     | 170 +++
>>  drivers/staging/riscv/kvm/tlb.S               |  74 ++
>>  drivers/staging/riscv/kvm/vcpu.c              | 997 ++++++++++++++++++
>>  drivers/staging/riscv/kvm/vcpu_exit.c         | 701 ++++++++++++
>>  drivers/staging/riscv/kvm/vcpu_sbi.c          | 173 +++
>>  drivers/staging/riscv/kvm/vcpu_switch.S       | 401 +++++++
>>  drivers/staging/riscv/kvm/vcpu_timer.c        | 225 ++++
>>  drivers/staging/riscv/kvm/vm.c                |  81 ++
>>  drivers/staging/riscv/kvm/vmid.c              | 120 +++
>>  include/clocksource/timer-riscv.h             |  16 +
>>  include/uapi/linux/kvm.h                      |   8 +
>>  25 files changed, 4706 insertions(+), 9 deletions(-)
>>  create mode 100644 arch/riscv/include/uapi/asm/kvm.h
>>  create mode 100644 drivers/staging/riscv/kvm/Kconfig
>>  create mode 100644 drivers/staging/riscv/kvm/Makefile
>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_csr.h
>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_host.h
>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_types.h
>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_vcpu_timer.h
>>  create mode 100644 drivers/staging/riscv/kvm/main.c
>>  create mode 100644 drivers/staging/riscv/kvm/mmu.c
>>  create mode 100644 drivers/staging/riscv/kvm/riscv_offsets.c
>>  create mode 100644 drivers/staging/riscv/kvm/tlb.S
>>  create mode 100644 drivers/staging/riscv/kvm/vcpu.c
>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_exit.c
>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_sbi.c
>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_switch.S
>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_timer.c
>>  create mode 100644 drivers/staging/riscv/kvm/vm.c
>>  create mode 100644 drivers/staging/riscv/kvm/vmid.c
>>  create mode 100644 include/clocksource/timer-riscv.h
>>
>> --
>> 2.25.1
>>
Damien Le Moal May 24, 2021, 11:08 p.m. UTC | #19
On 2021/05/25 7:57, Palmer Dabbelt wrote:
> On Mon, 24 May 2021 00:09:45 PDT (-0700), guoren@kernel.org wrote:
>> Thx Anup,
>>
>> Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)
>>
>> I'm following your KVM patchset and it's a great job for riscv
>> H-extension. I think hardware companies hope Linux KVM ready first
>> before the real chip. That means we can ensure the hardware could run
>> mainline linux.
> 
> I understand that it would be wonderful for hardware vendors to have a 
> guarantee that their hardware will be supported by the software 
> ecosystem, but that's not what we're talking about here.  Specifically, 
> the proposal for this code is to track the latest draft extension which 
> would specifically leave vendors who implement the current draft out in 
> the cold was something to change.  In practice that is the only way to 
> move forward with any draft extension that doesn't have hardware 
> available, as the software RISC-V implementations rapidly deprecate 
> draft extensions and without a way to test our code it is destined to 
> bit rot.

To facilitate the process of implementing, and updating, against draft
specifications, I proposed to have arch/riscv/staging added. This would be the
place to put code based on drafts. Some simple rules can be put in place:
1) The code and eventual ABI may change any time, no guarantees of backward
compatibility
2) Once the specifications are frozen, the code is moved out of staging
somewhere else.
3) The code may be removed any time if the specification proposal is dropped, or
any other valid reason (can't think of any other right now)
4) ...

This way, the implementation process would be greatly facilitated and
interactions between different extensions can be explored much more easily.

Thoughts ?



> 
> If vendors want to make sure their hardware is supported then the best 
> way to do that is to make sure specifications get ratified in a timely 
> fashion that describe the behavior required from their products.  That 
> way we have an agreed upon interface that vendors can implement and 
> software can rely on.  I understand that a lot of people are frustrated 
> with the pace of that process when it comes to the H extension, but 
> circumventing that process doesn't fix the fundamental problem.  If 
> there really are products out there that people can't build because the 
> H extension isn't upstream then we need to have a serious discussion 
> about those, but without something specific to discuss this is just 
> going to devolve into speculation which isn't a good use of time.
> 
> I can't find any hardware that implements the draft H extension, at 
> least via poking around on Google and in my email.  I'm very hesitant to 
> talk about private vendor roadmaps in public, as that's getting way too 
> close to my day job, but I've yet to have a vendor raise this as an 
> issue to me privately and I do try my best to make sure to talk to the 
> RISC-V hardware vendors whenever possible (though I try to stick to 
> public roadmaps there, to avoid issues around discussions like this and 
> conflicts with work).  Anup is clearly in a much more privileged 
> position than I am here, given that he has real hardware and is able to 
> allude to vendor roadmaps that I can't find in public, but until we can 
> all get on the same page about that it's going to be difficult to have a 
> reasonable discussion -- if we all have different information we're 
> naturally going to arrive at different conclusions, which IMO is why 
> this argument just keeps coming up.  It's totally possible I'm just 
> missing something here, in which case I'd love to be corrected as we can 
> be having a very different discussion.
> 
> I certainly hope that vendors understand that we're willing to work with 
> them when it comes to making the software run on their hardware.  I've 
> always tried to be quite explicit that's our goal here, both by just 
> saying so and by demonstrating that we're willing to take code that 
> exhibits behavior not specified by the specifications but that is 
> necessary to make real hardware work.  There's always a balance here and 
> I can't commit to making every chip with a RISC-V logo on it run Linux 
> well, as there will always be some implementations that are just 
> impossible to run real code on, but I'm always willing to do whatever I 
> can to try to make things work.
> 
> If anyone has concrete concerns about RISC-V hardware not being 
> supported by Linux then I'm happy to have a discussion about that.  
> Having a discussion in public is always best, as then everyone can be on 
> the same page, but as far as I know we're doing a good job supporting 
> the publicly available hardware -- not saying we're perfect, but given 
> the size of the RISC-V user base and how new much of the hardware is I 
> think we're well above average when it comes to upstream support of real 
> hardware.  I have a feeling anyone's worries would be about unreleased 
> hardware, in which case I can understand it's difficult to have concrete 
> discussions in public.  I'm always happy to at least make an attempt to 
> have private discussions about these (private discussion are tricky, 
> though, so I can't promise I can always participate), and while I don't 
> think those discussions should meaningfully sway the kernel's policies 
> one way or the other it could at least help alleviate any acute concerns 
> that vendors have.  We've gotten to the point where some pretty serious 
> accusations are starting to get thrown around, and that sort of thing 
> really doesn't benefit anyone so I'm willing to do whatever I can to 
> help fix that.
> 
> IMO we're just trying to follow the standard Linux development policies 
> here, where the focus is on making real hardware work in a way that can 
> be sustainably maintained so we don't break users.  If anything I think 
> we're a notch more liberal WRT the code we accept than standard with the 
> current policy, as accepting anything in a frozen extension doesn't even 
> require a commitment from a hardware vendor WRT implementing the code.  
> That obviously opens us up to behavior differences between the hardware 
> and the specification, which have historically been retrofitted back to 
> the specifications, but I'm willing to take on the extra work as it 
> helps lend weight to the specification development process.
> 
> If I'm just missing something here and there is publicly available 
> hardware that implements the H extension then I'd be happy to have that 
> pointed out and very much change the tune of this discussion, but until 
> hardware shows up or the ISA is frozen I just don't see any way to 
> maintain this code up the standards generally set by Linux or 
> specifically by arch/riscv and therefor cannot support merging it.
> 
>>
>> Good luck!
>>
>> On Wed, May 19, 2021 at 11:36 AM Anup Patel <anup.patel@wdc.com> wrote:
>>>
>>> From: Anup Patel <anup@brainfault.org>
>>>
>>> This series adds initial KVM RISC-V support. Currently, we are able to boot
>>> Linux on RV64/RV32 Guest with multiple VCPUs.
>>>
>>> Key aspects of KVM RISC-V added by this series are:
>>> 1. No RISC-V specific KVM IOCTL
>>> 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
>>> 3. Both RV64 and RV32 host supported
>>> 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
>>> 5. KVM ONE_REG interface for VCPU register access from user-space
>>> 6. PLIC emulation is done in user-space
>>> 7. Timer and IPI emuation is done in-kernel
>>> 8. Both Sv39x4 and Sv48x4 supported for RV64 host
>>> 9. MMU notifiers supported
>>> 10. Generic dirtylog supported
>>> 11. FP lazy save/restore supported
>>> 12. SBI v0.1 emulation for KVM Guest available
>>> 13. Forward unhandled SBI calls to KVM userspace
>>> 14. Hugepage support for Guest/VM
>>> 15. IOEVENTFD support for Vhost
>>>
>>> Here's a brief TODO list which we will work upon after this series:
>>> 1. SBI v0.2 emulation in-kernel
>>> 2. SBI v0.2 hart state management emulation in-kernel
>>> 3. In-kernel PLIC emulation
>>> 4. ..... and more .....
>>>
>>> This series can be found in riscv_kvm_v18 branch at:
>>> https//github.com/avpatel/linux.git
>>>
>>> Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v7 branch
>>> at: https//github.com/avpatel/kvmtool.git
>>>
>>> The QEMU RISC-V hypervisor emulation is done by Alistair and is available
>>> in master branch at: https://git.qemu.org/git/qemu.git
>>>
>>> To play around with KVM RISC-V, refer KVM RISC-V wiki at:
>>> https://github.com/kvm-riscv/howto/wiki
>>> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
>>> https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
>>>
>>> Changes since v17:
>>>  - Rebased on Linux-5.13-rc2
>>>  - Moved to new KVM MMU notifier APIs
>>>  - Removed redundant kvm_arch_vcpu_uninit()
>>>  - Moved KVM RISC-V sources to drivers/staging for compliance with
>>>    Linux RISC-V patch acceptance policy
>>>
>>> Changes since v16:
>>>  - Rebased on Linux-5.12-rc5
>>>  - Remove redundant kvm_arch_create_memslot(), kvm_arch_vcpu_setup(),
>>>    kvm_arch_vcpu_init(), kvm_arch_has_vcpu_debugfs(), and
>>>    kvm_arch_create_vcpu_debugfs() from PATCH5
>>>  - Make stage2_wp_memory_region() and stage2_ioremap() as static
>>>    in PATCH13
>>>
>>> Changes since v15:
>>>  - Rebased on Linux-5.11-rc3
>>>  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
>>>    writeability of a host pfn.
>>>  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
>>>    uapi/asm/kvm.h
>>>
>>> Changes since v14:
>>>  - Rebased on Linux-5.10-rc3
>>>  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
>>>
>>> Changes since v13:
>>>  - Rebased on Linux-5.9-rc3
>>>  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
>>>  - Fixed instruction length computation in PATCH7
>>>  - Added ioeventfd support in PATCH7
>>>  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
>>>    intructions in PATCH7
>>>  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
>>>    in PATCH10
>>>  - Added stage2 dirty page logging in PATCH10
>>>  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
>>>  - Save/restore SCOUNTEREN in PATCH6
>>>  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
>>>    using CSR swap instruction in PATCH6
>>>  - Detect and use Sv48x4 when available in PATCH10
>>>
>>> Changes since v12:
>>>  - Rebased patches on Linux-5.8-rc4
>>>  - By default enable all counters in HCOUNTEREN
>>>  - RISC-V H-Extension v0.6.1 spec support
>>>
>>> Changes since v11:
>>>  - Rebased patches on Linux-5.7-rc3
>>>  - Fixed typo in typecast of stage2_map_size define
>>>  - Introduced struct kvm_cpu_trap to represent trap details and
>>>    use it as function parameter wherever applicable
>>>  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
>>>    logging in future
>>>  - RISC-V H-Extension v0.6 spec support
>>>  - Send-out first three patches as separate series so that it can
>>>    be taken by Palmer for Linux RISC-V
>>>
>>> Changes since v10:
>>>  - Rebased patches on Linux-5.6-rc5
>>>  - Reduce RISCV_ISA_EXT_MAX from 256 to 64
>>>  - Separate PATCH for removing N-extension related defines
>>>  - Added comments as requested by Palmer
>>>  - Fixed HIDELEG CSR programming
>>>
>>> Changes since v9:
>>>  - Rebased patches on Linux-5.5-rc3
>>>  - Squash PATCH19 and PATCH20 into PATCH5
>>>  - Squash PATCH18 into PATCH11
>>>  - Squash PATCH17 into PATCH16
>>>  - Added ONE_REG interface for VCPU timer in PATCH13
>>>  - Use HTIMEDELTA for VCPU timer in PATCH13
>>>  - Updated KVM RISC-V mailing list in MAINTAINERS entry
>>>  - Update KVM kconfig option to depend on RISCV_SBI and MMU
>>>  - Check for SBI v0.2 and SBI v0.2 RFENCE extension at boot-time
>>>  - Use SBI v0.2 RFENCE extension in VMID implementation
>>>  - Use SBI v0.2 RFENCE extension in Stage2 MMU implementation
>>>  - Use SBI v0.2 RFENCE extension in SBI implementation
>>>  - Moved to RISC-V Hypervisor v0.5 draft spec
>>>  - Updated Documentation/virt/kvm/api.txt for timer ONE_REG interface
>>>
>>> Changes since v8:
>>>  - Rebased series on Linux-5.4-rc3 and Atish's SBI v0.2 patches
>>>  - Use HRTIMER_MODE_REL instead of HRTIMER_MODE_ABS in timer emulation
>>>  - Fixed kvm_riscv_stage2_map() to handle hugepages
>>>  - Added patch to forward unhandled SBI calls to user-space
>>>  - Added patch for iterative/recursive stage2 page table programming
>>>  - Added patch to remove per-CPU vsip_shadow variable
>>>  - Added patch to fix race-condition in kvm_riscv_vcpu_sync_interrupts()
>>>
>>> Changes since v7:
>>>  - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
>>>  - Removed PATCH1, PATCH3, and PATCH20 because these already merged
>>>  - Use kernel doc style comments for ISA bitmap functions
>>>  - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
>>>    be added in-future
>>>  - Mark KVM RISC-V kconfig option as EXPERIMENTAL
>>>  - Typo fix in commit description of PATCH6 of v7 series
>>>  - Use separate structs for CORE and CSR registers of ONE_REG interface
>>>  - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
>>>  - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
>>>  - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
>>>  - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
>>>  - Added defines for checking/decoding instruction length
>>>  - Added separate patch to forward unhandled SBI calls to userspace tool
>>>
>>> Changes since v6:
>>>  - Rebased patches on Linux-5.3-rc7
>>>  - Added "return_handled" in struct kvm_mmio_decode to ensure that
>>>    kvm_riscv_vcpu_mmio_return() updates SEPC only once
>>>  - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
>>>  - Updated git repo URL in MAINTAINERS entry
>>>
>>> Changes since v5:
>>>  - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
>>>    KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
>>>  - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
>>>  - Use switch case instead of illegal instruction opcode table for simplicity
>>>  - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
>>>   flush optimization
>>>  - Handle all unsupported SBI calls in default case of
>>>    kvm_riscv_vcpu_sbi_ecall() function
>>>  - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
>>>  - Improved unprivilege reads to handle traps due to Guest stage1 page table
>>>  - Added separate patch to document RISC-V specific things in
>>>    Documentation/virt/kvm/api.txt
>>>
>>> Changes since v4:
>>>  - Rebased patches on Linux-5.3-rc5
>>>  - Added Paolo's Acked-by and Reviewed-by
>>>  - Updated mailing list in MAINTAINERS entry
>>>
>>> Changes since v3:
>>>  - Moved patch for ISA bitmap from KVM prep series to this series
>>>  - Make vsip_shadow as run-time percpu variable instead of compile-time
>>>  - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs
>>>
>>> Changes since v2:
>>>  - Removed references of KVM_REQ_IRQ_PENDING from all patches
>>>  - Use kvm->srcu within in-kernel KVM run loop
>>>  - Added percpu vsip_shadow to track last value programmed in VSIP CSR
>>>  - Added comments about irqs_pending and irqs_pending_mask
>>>  - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
>>>    in system_opcode_insn()
>>>  - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
>>>  - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
>>>  - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid
>>>
>>> Changes since v1:
>>>  - Fixed compile errors in building KVM RISC-V as module
>>>  - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
>>>  - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
>>>  - Made vmid_version as unsigned long instead of atomic
>>>  - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
>>>  - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
>>>  - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
>>>  - Updated ONE_REG interface for CSR access to user-space
>>>  - Removed irqs_pending_lock and use atomic bitops instead
>>>  - Added separate patch for FP ONE_REG interface
>>>  - Added separate patch for updating MAINTAINERS file
>>>
>>> Anup Patel (14):
>>>   RISC-V: Add hypervisor extension related CSR defines
>>>   RISC-V: Add initial skeletal KVM support
>>>   RISC-V: KVM: Implement VCPU create, init and destroy functions
>>>   RISC-V: KVM: Implement VCPU interrupts and requests handling
>>>   RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
>>>   RISC-V: KVM: Implement VCPU world-switch
>>>   RISC-V: KVM: Handle MMIO exits for VCPU
>>>   RISC-V: KVM: Handle WFI exits for VCPU
>>>   RISC-V: KVM: Implement VMID allocator
>>>   RISC-V: KVM: Implement stage2 page table programming
>>>   RISC-V: KVM: Implement MMU notifiers
>>>   RISC-V: KVM: Document RISC-V specific parts of KVM API
>>>   RISC-V: KVM: Move sources to drivers/staging directory
>>>   RISC-V: KVM: Add MAINTAINERS entry
>>>
>>> Atish Patra (4):
>>>   RISC-V: KVM: Add timer functionality
>>>   RISC-V: KVM: FP lazy save/restore
>>>   RISC-V: KVM: Implement ONE REG interface for FP registers
>>>   RISC-V: KVM: Add SBI v0.1 support
>>>
>>>  Documentation/virt/kvm/api.rst                | 193 +++-
>>>  MAINTAINERS                                   |  11 +
>>>  arch/riscv/Kconfig                            |   1 +
>>>  arch/riscv/Makefile                           |   1 +
>>>  arch/riscv/include/uapi/asm/kvm.h             | 128 +++
>>>  drivers/clocksource/timer-riscv.c             |   9 +
>>>  drivers/staging/riscv/kvm/Kconfig             |  36 +
>>>  drivers/staging/riscv/kvm/Makefile            |  23 +
>>>  drivers/staging/riscv/kvm/asm/kvm_csr.h       | 105 ++
>>>  drivers/staging/riscv/kvm/asm/kvm_host.h      | 271 +++++
>>>  drivers/staging/riscv/kvm/asm/kvm_types.h     |   7 +
>>>  .../staging/riscv/kvm/asm/kvm_vcpu_timer.h    |  44 +
>>>  drivers/staging/riscv/kvm/main.c              | 118 +++
>>>  drivers/staging/riscv/kvm/mmu.c               | 802 ++++++++++++++
>>>  drivers/staging/riscv/kvm/riscv_offsets.c     | 170 +++
>>>  drivers/staging/riscv/kvm/tlb.S               |  74 ++
>>>  drivers/staging/riscv/kvm/vcpu.c              | 997 ++++++++++++++++++
>>>  drivers/staging/riscv/kvm/vcpu_exit.c         | 701 ++++++++++++
>>>  drivers/staging/riscv/kvm/vcpu_sbi.c          | 173 +++
>>>  drivers/staging/riscv/kvm/vcpu_switch.S       | 401 +++++++
>>>  drivers/staging/riscv/kvm/vcpu_timer.c        | 225 ++++
>>>  drivers/staging/riscv/kvm/vm.c                |  81 ++
>>>  drivers/staging/riscv/kvm/vmid.c              | 120 +++
>>>  include/clocksource/timer-riscv.h             |  16 +
>>>  include/uapi/linux/kvm.h                      |   8 +
>>>  25 files changed, 4706 insertions(+), 9 deletions(-)
>>>  create mode 100644 arch/riscv/include/uapi/asm/kvm.h
>>>  create mode 100644 drivers/staging/riscv/kvm/Kconfig
>>>  create mode 100644 drivers/staging/riscv/kvm/Makefile
>>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_csr.h
>>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_host.h
>>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_types.h
>>>  create mode 100644 drivers/staging/riscv/kvm/asm/kvm_vcpu_timer.h
>>>  create mode 100644 drivers/staging/riscv/kvm/main.c
>>>  create mode 100644 drivers/staging/riscv/kvm/mmu.c
>>>  create mode 100644 drivers/staging/riscv/kvm/riscv_offsets.c
>>>  create mode 100644 drivers/staging/riscv/kvm/tlb.S
>>>  create mode 100644 drivers/staging/riscv/kvm/vcpu.c
>>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_exit.c
>>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_sbi.c
>>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_switch.S
>>>  create mode 100644 drivers/staging/riscv/kvm/vcpu_timer.c
>>>  create mode 100644 drivers/staging/riscv/kvm/vm.c
>>>  create mode 100644 drivers/staging/riscv/kvm/vmid.c
>>>  create mode 100644 include/clocksource/timer-riscv.h
>>>
>>> --
>>> 2.25.1
>>>
>
Greg Kroah-Hartman May 25, 2021, 7:37 a.m. UTC | #20
On Mon, May 24, 2021 at 11:08:30PM +0000, Damien Le Moal wrote:
> On 2021/05/25 7:57, Palmer Dabbelt wrote:
> > On Mon, 24 May 2021 00:09:45 PDT (-0700), guoren@kernel.org wrote:
> >> Thx Anup,
> >>
> >> Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)
> >>
> >> I'm following your KVM patchset and it's a great job for riscv
> >> H-extension. I think hardware companies hope Linux KVM ready first
> >> before the real chip. That means we can ensure the hardware could run
> >> mainline linux.
> > 
> > I understand that it would be wonderful for hardware vendors to have a 
> > guarantee that their hardware will be supported by the software 
> > ecosystem, but that's not what we're talking about here.  Specifically, 
> > the proposal for this code is to track the latest draft extension which 
> > would specifically leave vendors who implement the current draft out in 
> > the cold was something to change.  In practice that is the only way to 
> > move forward with any draft extension that doesn't have hardware 
> > available, as the software RISC-V implementations rapidly deprecate 
> > draft extensions and without a way to test our code it is destined to 
> > bit rot.
> 
> To facilitate the process of implementing, and updating, against draft
> specifications, I proposed to have arch/riscv/staging added. This would be the
> place to put code based on drafts. Some simple rules can be put in place:
> 1) The code and eventual ABI may change any time, no guarantees of backward
> compatibility
> 2) Once the specifications are frozen, the code is moved out of staging
> somewhere else.
> 3) The code may be removed any time if the specification proposal is dropped, or
> any other valid reason (can't think of any other right now)
> 4) ...
> 
> This way, the implementation process would be greatly facilitated and
> interactions between different extensions can be explored much more easily.
> 
> Thoughts ?

It will not work, unless you are mean and ruthless and people will get
mad at you.  I do not recommend it at all.

Once code shows up in the kernel tree, and people rely on it, you now
_have_ to support it.  Users don't know the difference between "staging
or not staging" at all.  We have reported problems of staging media
drivers breaking userspace apps and people having problems with that,
despite the media developers trying to tell the world, "DO NOT RELY ON
THESE!".

And if this can't be done with tiny simple single drivers, you are going
to have a world-of-hurt if you put arch/platform support into
arch/riscv/.  Once it's there, you will never be able to delete it,
trust me.

If you REALLY wanted to do this, you could create drivers/staging/riscv/
and try to make the following rules:

	- stand-alone code only, can not depend on ANYTHING outside of
	  the directory that is not also used by other in-kernel code
	- does not expose any userspace apis
	- interacts only with existing in-kernel code.
	- can be deleted at any time, UNLESS someone is using it for
	  functionality on a system

But what use would that be?  What could you put into there that anyone
would be able to actually use?

Remember the rule we made to our user community over 15 years ago:

	We will not break userspace functionality*

With the caveat of "* - in a way that you notice".

That means we can remove and change things that no one relies on
anymore, as long as if someone pops up that does rely on it, we put it
back.

We do this because we never want anyone to be afraid to drop in a new
kernel, because they know we did not break their existing hardware and
userspace workloads.  And if we did, we will work quickly to fix it.

So back to the original issue here, what is the problem that you are
trying to solve?  Why do you want to have in-kernel code for hardware
that no one else can have access to, and that isn't part of a "finalized
spec" that ends up touching other subsystems and is not self-contained?

Why not take the energy here and go get that spec ratified so we aren't
having this argument anymore?  What needs to be done to make that happen
and why hasn't anyone done that?  There's nothing keeping kernel
developers from working on spec groups, right?

thanks,

greg k-h
Damien Le Moal May 25, 2021, 8:01 a.m. UTC | #21
On 2021/05/25 16:37, Greg KH wrote:
> On Mon, May 24, 2021 at 11:08:30PM +0000, Damien Le Moal wrote:
>> On 2021/05/25 7:57, Palmer Dabbelt wrote:
>>> On Mon, 24 May 2021 00:09:45 PDT (-0700), guoren@kernel.org wrote:
>>>> Thx Anup,
>>>>
>>>> Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)
>>>>
>>>> I'm following your KVM patchset and it's a great job for riscv
>>>> H-extension. I think hardware companies hope Linux KVM ready first
>>>> before the real chip. That means we can ensure the hardware could run
>>>> mainline linux.
>>>
>>> I understand that it would be wonderful for hardware vendors to have a 
>>> guarantee that their hardware will be supported by the software 
>>> ecosystem, but that's not what we're talking about here.  Specifically, 
>>> the proposal for this code is to track the latest draft extension which 
>>> would specifically leave vendors who implement the current draft out in 
>>> the cold was something to change.  In practice that is the only way to 
>>> move forward with any draft extension that doesn't have hardware 
>>> available, as the software RISC-V implementations rapidly deprecate 
>>> draft extensions and without a way to test our code it is destined to 
>>> bit rot.
>>
>> To facilitate the process of implementing, and updating, against draft
>> specifications, I proposed to have arch/riscv/staging added. This would be the
>> place to put code based on drafts. Some simple rules can be put in place:
>> 1) The code and eventual ABI may change any time, no guarantees of backward
>> compatibility
>> 2) Once the specifications are frozen, the code is moved out of staging
>> somewhere else.
>> 3) The code may be removed any time if the specification proposal is dropped, or
>> any other valid reason (can't think of any other right now)
>> 4) ...
>>
>> This way, the implementation process would be greatly facilitated and
>> interactions between different extensions can be explored much more easily.
>>
>> Thoughts ?
> 
> It will not work, unless you are mean and ruthless and people will get
> mad at you.  I do not recommend it at all.
> 
> Once code shows up in the kernel tree, and people rely on it, you now
> _have_ to support it.  Users don't know the difference between "staging
> or not staging" at all.  We have reported problems of staging media
> drivers breaking userspace apps and people having problems with that,
> despite the media developers trying to tell the world, "DO NOT RELY ON
> THESE!".
> 
> And if this can't be done with tiny simple single drivers, you are going
> to have a world-of-hurt if you put arch/platform support into
> arch/riscv/.  Once it's there, you will never be able to delete it,
> trust me.

All very good points. Thank you for sharing.

> If you REALLY wanted to do this, you could create drivers/staging/riscv/
> and try to make the following rules:
> 
> 	- stand-alone code only, can not depend on ANYTHING outside of
> 	  the directory that is not also used by other in-kernel code
> 	- does not expose any userspace apis
> 	- interacts only with existing in-kernel code.
> 	- can be deleted at any time, UNLESS someone is using it for
> 	  functionality on a system
> 
> But what use would that be?  What could you put into there that anyone
> would be able to actually use?

Yes, you already mentioned this and we were not thinking about this solution.
drivers/staging really is for device drivers and does not apply to arch code.

> Remember the rule we made to our user community over 15 years ago:
> 
> 	We will not break userspace functionality*
> 
> With the caveat of "* - in a way that you notice".
> 
> That means we can remove and change things that no one relies on
> anymore, as long as if someone pops up that does rely on it, we put it
> back.
> 
> We do this because we never want anyone to be afraid to drop in a new
> kernel, because they know we did not break their existing hardware and
> userspace workloads.  And if we did, we will work quickly to fix it.

Yes, I am well aware of this rule.

> So back to the original issue here, what is the problem that you are
> trying to solve?  Why do you want to have in-kernel code for hardware
> that no one else can have access to, and that isn't part of a "finalized
> spec" that ends up touching other subsystems and is not self-contained?

For the case at hand, the only thing that would be outside of the staging area
would be the ABI definition, but that one depends only on the ratified riscv ISA
specs. So having it outside of staging would be OK. The idea of the arch staging
area is 2 fold:
1) facilitate the development work overall, both for Paolo and Anup on the KVM
part, but also others to check that their changes do not break KVM support.
2) Provide feedback to the specs groups that their concerns are moot. E.g. one
reason the hypervisor specs are being delayed is concerns with interrupt
handling. With a working implementation based on current ratified specs for
other components (e.g. interrupt controller), the hope is that the specs group
can speed up freezing of the specs.

But your points about how users will likely end up using this potentially
creates a lot more problems than we are solving...

> Why not take the energy here and go get that spec ratified so we aren't
> having this argument anymore?  What needs to be done to make that happen
> and why hasn't anyone done that?  There's nothing keeping kernel
> developers from working on spec groups, right?

We are participating and giving arguments for freezing the specs. This is
however not working as we would like. But that is a problem to be addressed with
RISCV International and the processes governing the operation of specification
groups. The linux mailing lists are not the right place to discuss this, so I
will not go into more details.

Thank you for the feedback.

Best regards.
Greg Kroah-Hartman May 25, 2021, 8:11 a.m. UTC | #22
On Tue, May 25, 2021 at 08:01:01AM +0000, Damien Le Moal wrote:
> On 2021/05/25 16:37, Greg KH wrote:
> > On Mon, May 24, 2021 at 11:08:30PM +0000, Damien Le Moal wrote:
> >> On 2021/05/25 7:57, Palmer Dabbelt wrote:
> >>> On Mon, 24 May 2021 00:09:45 PDT (-0700), guoren@kernel.org wrote:
> >>>> Thx Anup,
> >>>>
> >>>> Tested-by: Guo Ren <guoren@kernel.org> (Just on qemu-rv64)
> >>>>
> >>>> I'm following your KVM patchset and it's a great job for riscv
> >>>> H-extension. I think hardware companies hope Linux KVM ready first
> >>>> before the real chip. That means we can ensure the hardware could run
> >>>> mainline linux.
> >>>
> >>> I understand that it would be wonderful for hardware vendors to have a 
> >>> guarantee that their hardware will be supported by the software 
> >>> ecosystem, but that's not what we're talking about here.  Specifically, 
> >>> the proposal for this code is to track the latest draft extension which 
> >>> would specifically leave vendors who implement the current draft out in 
> >>> the cold was something to change.  In practice that is the only way to 
> >>> move forward with any draft extension that doesn't have hardware 
> >>> available, as the software RISC-V implementations rapidly deprecate 
> >>> draft extensions and without a way to test our code it is destined to 
> >>> bit rot.
> >>
> >> To facilitate the process of implementing, and updating, against draft
> >> specifications, I proposed to have arch/riscv/staging added. This would be the
> >> place to put code based on drafts. Some simple rules can be put in place:
> >> 1) The code and eventual ABI may change any time, no guarantees of backward
> >> compatibility
> >> 2) Once the specifications are frozen, the code is moved out of staging
> >> somewhere else.
> >> 3) The code may be removed any time if the specification proposal is dropped, or
> >> any other valid reason (can't think of any other right now)
> >> 4) ...
> >>
> >> This way, the implementation process would be greatly facilitated and
> >> interactions between different extensions can be explored much more easily.
> >>
> >> Thoughts ?
> > 
> > It will not work, unless you are mean and ruthless and people will get
> > mad at you.  I do not recommend it at all.
> > 
> > Once code shows up in the kernel tree, and people rely on it, you now
> > _have_ to support it.  Users don't know the difference between "staging
> > or not staging" at all.  We have reported problems of staging media
> > drivers breaking userspace apps and people having problems with that,
> > despite the media developers trying to tell the world, "DO NOT RELY ON
> > THESE!".
> > 
> > And if this can't be done with tiny simple single drivers, you are going
> > to have a world-of-hurt if you put arch/platform support into
> > arch/riscv/.  Once it's there, you will never be able to delete it,
> > trust me.
> 
> All very good points. Thank you for sharing.
> 
> > If you REALLY wanted to do this, you could create drivers/staging/riscv/
> > and try to make the following rules:
> > 
> > 	- stand-alone code only, can not depend on ANYTHING outside of
> > 	  the directory that is not also used by other in-kernel code
> > 	- does not expose any userspace apis
> > 	- interacts only with existing in-kernel code.
> > 	- can be deleted at any time, UNLESS someone is using it for
> > 	  functionality on a system
> > 
> > But what use would that be?  What could you put into there that anyone
> > would be able to actually use?
> 
> Yes, you already mentioned this and we were not thinking about this solution.
> drivers/staging really is for device drivers and does not apply to arch code.

Then you can not use the "staging model" anywhere else, especially in
arch code.  We tried that many years ago, and it instantly failed and we
ripped it out.  Learn from our mistakes please.

> > So back to the original issue here, what is the problem that you are
> > trying to solve?  Why do you want to have in-kernel code for hardware
> > that no one else can have access to, and that isn't part of a "finalized
> > spec" that ends up touching other subsystems and is not self-contained?
> 
> For the case at hand, the only thing that would be outside of the staging area
> would be the ABI definition, but that one depends only on the ratified riscv ISA
> specs. So having it outside of staging would be OK. The idea of the arch staging
> area is 2 fold:
> 1) facilitate the development work overall, both for Paolo and Anup on the KVM
> part, but also others to check that their changes do not break KVM support.

Who are the "others" here?  You can't force your code into the tree just
to keep it up to date with internal apis that others are changing, if
you have no real users for it yet.  That's asking others to do your work
for you :(

> 2) Provide feedback to the specs groups that their concerns are moot. E.g. one
> reason the hypervisor specs are being delayed is concerns with interrupt
> handling. With a working implementation based on current ratified specs for
> other components (e.g. interrupt controller), the hope is that the specs group
> can speed up freezing of the specs.

There is the issue of specs-without-working-code that can cause major
problems.  But you have code, it does not have to be merged into the
kernel tree to prove/disprove specs, so don't push the inability of your
standards group to come to an agreement to the kernel developer
community.  Again, you are making us do your work for you here :(

> But your points about how users will likely end up using this potentially
> creates a lot more problems than we are solving...

Thank you for understanding.

good luck with your standards meetings!

greg k-h
Paolo Bonzini May 25, 2021, 8:24 a.m. UTC | #23
On 25/05/21 10:11, Greg KH wrote:
>> 1) facilitate the development work overall, both for Paolo and Anup on the KVM
>> part, but also others to check that their changes do not break KVM support.
> 
> Who are the "others" here?  You can't force your code into the tree just
> to keep it up to date with internal apis that others are changing, if
> you have no real users for it yet.  That's asking others to do your work
> for you:(

I don't know about changes that would break KVM support.  However, 
"other KVM developers" would be able to check that their changes do not 
break the RISC-V implementation, and I would certainly either enforce 
that or do the work myself.

Also, excluding simulators and emulators from the set of "real users" 
ignores the needs of userspace developers, as well as other uses such as 
education/academia.  Linux for x86 (both KVM and bare metal) supports 
features that are only available in emulators and simulators which are 
not even free software.  I am pretty sure that there would be more users 
of KVM/RISC-V than with KVM/MIPS, despite the latter having support in 
real hardware.

Paolo