[v6,7/7] Documentation: KVM: Add hypercall for LoongArch

Message ID	20240302084724.1415344-1-maobibo@loongson.cn (mailing list archive)
State	New, archived
Headers	show Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8B0AE79F0; Sat, 2 Mar 2024 08:47:32 +0000 (UTC) From: Bibo Mao <maobibo@loongson.cn> To: Tianrui Zhao <zhaotianrui@loongson.cn>, Juergen Gross <jgross@suse.com>, Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net> Cc: loongarch@lists.linux.dev, linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH v6 7/7] Documentation: KVM: Add hypercall for LoongArch Date: Sat, 2 Mar 2024 16:47:24 +0800 Message-Id: <20240302084724.1415344-1-maobibo@loongson.cn> In-Reply-To: <20240302082532.1415200-1-maobibo@loongson.cn> References: <20240302082532.1415200-1-maobibo@loongson.cn> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	LoongArch: Add pv ipi support on LoongArch VM \| expand [v6,0/7] LoongArch: Add pv ipi support on LoongArch VM [v6,1/7] LoongArch/smp: Refine some ipi functions on LoongArch platform [v6,2/7] LoongArch: KVM: Add hypercall instruction emulation support [v6,3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor [v6,4/7] LoongArch: KVM: Add vcpu search support from physical cpuid [v6,5/7] LoongArch: KVM: Add pv ipi support on kvm side [v6,6/7] LoongArch: Add pv ipi support on guest kernel side [v6,7/7] Documentation: KVM: Add hypercall for LoongArch

Bibo Mao March 2, 2024, 8:47 a.m. UTC

Add documentation topic for using pv_virt when running as a guest
on KVM hypervisor.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 Documentation/virt/kvm/index.rst              |  1 +
 .../virt/kvm/loongarch/hypercalls.rst         | 79 +++++++++++++++++++
 Documentation/virt/kvm/loongarch/index.rst    | 10 +++
 3 files changed, 90 insertions(+)
 create mode 100644 Documentation/virt/kvm/loongarch/hypercalls.rst
 create mode 100644 Documentation/virt/kvm/loongarch/index.rst

WANG Xuerui March 2, 2024, 9:41 a.m. UTC | #1

On 3/2/24 16:47, Bibo Mao wrote:
> Add documentation topic for using pv_virt when running as a guest
> on KVM hypervisor.
> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>   Documentation/virt/kvm/index.rst              |  1 +
>   .../virt/kvm/loongarch/hypercalls.rst         | 79 +++++++++++++++++++
>   Documentation/virt/kvm/loongarch/index.rst    | 10 +++
>   3 files changed, 90 insertions(+)
>   create mode 100644 Documentation/virt/kvm/loongarch/hypercalls.rst
>   create mode 100644 Documentation/virt/kvm/loongarch/index.rst
> 
> diff --git a/Documentation/virt/kvm/index.rst b/Documentation/virt/kvm/index.rst
> index ad13ec55ddfe..9ca5a45c2140 100644
> --- a/Documentation/virt/kvm/index.rst
> +++ b/Documentation/virt/kvm/index.rst
> @@ -14,6 +14,7 @@ KVM
>      s390/index
>      ppc-pv
>      x86/index
> +   loongarch/index
>   
>      locking
>      vcpu-requests
> diff --git a/Documentation/virt/kvm/loongarch/hypercalls.rst b/Documentation/virt/kvm/loongarch/hypercalls.rst
> new file mode 100644
> index 000000000000..1679e48d67d2
> --- /dev/null
> +++ b/Documentation/virt/kvm/loongarch/hypercalls.rst
> @@ -0,0 +1,79 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================================
> +The LoongArch paravirtual interface
> +===================================
> +
> +KVM hypercalls use the HVCL instruction with code 0x100, and the hypercall
> +number is put in a0 and up to five arguments may be placed in a1-a5, the
> +return value is placed in v0 (alias with a0).

Just say a0: the name v0 is long deprecated (has been the case ever 
since LoongArch got mainlined).

> +
> +The code for that interface can be found in arch/loongarch/kvm/*
> +
> +Querying for existence
> +======================
> +
> +To find out if we're running on KVM or not, cpucfg can be used with index
> +CPUCFG_KVM_BASE (0x40000000), cpucfg range between 0x40000000 - 0x400000FF
> +is marked as a specially reserved range. All existing and future processors
> +will not implement any features in this range.
> +
> +When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE (0x40000000)
> +returns magic string "KVM\0"
> +
> +Once you determined you're running under a PV capable KVM, you can now use
> +hypercalls as described below.

So this is still the approach similar to the x86 CPUID-based 
implementation. But here the non-privileged behavior isn't specified -- 
I see there is PLV checking in Patch 3 but it's safer to have the 
requirement spelled out here too.

But I still think this approach touches more places than strictly 
needed. As it is currently the case in 
arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for a 
bit IOCSRF_VM that already signifies presence of a hypervisor; if this 
information can be interpreted as availability of the HVCL instruction 
(which I suppose is the case -- a hypervisor can always trap-and-emulate 
in case HVCL isn't provided by hardware), here we can already start 
making calls with HVCL.

We can and should define a uniform interface for probing the hypervisor 
kind, similar to the centrally-managed RISC-V SBI implementation ID 
registry [1]: otherwise future non-KVM hypervisors would have to

1. somehow pretend they are KVM and eventually fail to do so, leading to 
subtle incompatibilities,
2. invent another way of probing for their existence,
3. piggy-back on the current KVM definition, which is inelegant (reading 
the LoongArch-KVM-defined CPUCFG leaf only to find it's not KVM) and 
utterly makes the definition here *not* KVM-specific.

[1]: 
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc

My take on this:

To check if we are running on Linux KVM or not, first check IOCSR 0x8 
(``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` to 
find out the hypervisor implementation ID; a return value in ``$a0`` of 
0x004d564b (``KVM\0``) means Linux KVM, in which case the rest of the 
convention applies.

> +
> +KVM hypercall ABI
> +=================
> +
> +Hypercall ABI on KVM is simple, only one scratch register a0 (v0) and at most
> +five generic registers used as input parameter. FP register and vector register
> +is not used for input register and should not be modified during hypercall.
> +Hypercall function can be inlined since there is only one scratch register.

It should be pointed out explicitly that on hypercall return all 
architectural state except ``$a0`` is preserved. Or is the whole ``$a0 - 
$t8`` range clobbered, just like with Linux syscalls?

> +
> +The parameters are as follows:
> +
> +        ========	================	================
> +	Register	IN			OUT
> +        ========	================	================
> +	a0		function number		Return code
> +	a1		1st parameter		-
> +	a2		2nd parameter		-
> +	a3		3rd parameter		-
> +	a4		4th parameter		-
> +	a5		5th parameter		-
> +        ========	================	================
> +
> +Return codes can be as follows:
> +
> +	====		=========================
> +	Code		Meaning
> +	====		=========================
> +	0		Success
> +	-1		Hypercall not implemented
> +	-2		Hypercall parameter error

What about re-using well-known errno's, like -ENOSYS for "hypercall not 
implemented" and -EINVAL for "invalid parameter"? This could save people 
some hair when more error codes are added in the future.

> +	====		=========================
> +
> +KVM Hypercalls Documentation
> +============================
> +
> +The template for each hypercall is:
> +1. Hypercall name
> +2. Purpose
> +
> +1. KVM_HCALL_FUNC_PV_IPI
> +------------------------
> +
> +:Purpose: Send IPIs to multiple vCPUs.
> +
> +- a0: KVM_HCALL_FUNC_PV_IPI
> +- a1: lower part of the bitmap of destination physical CPUIDs
> +- a2: higher part of the bitmap of destination physical CPUIDs
> +- a3: the lowest physical CPUID in bitmap

"CPU ID", instead of "CPUID" for clarity: I suppose most people reading 
this also know about x86, so "CPUID" could evoke the wrong intuition.

This function is equivalent to the C signature "void hypcall(int func, 
u128 mask, int lowest_cpu_id)", which I think is fine, but one can also 
see that the return value description is missing.

> +
> +The hypercall lets a guest send multicast IPIs, with at most 128
> +destinations per hypercall.  The destinations are represented by a bitmap
> +contained in the first two arguments (a1 and a2). Bit 0 of a1 corresponds
> +to the physical CPUID in the third argument (a3), bit 1 corresponds to the
> +physical ID a3+1, and so on.
> diff --git a/Documentation/virt/kvm/loongarch/index.rst b/Documentation/virt/kvm/loongarch/index.rst
> new file mode 100644
> index 000000000000..83387b4c5345
> --- /dev/null
> +++ b/Documentation/virt/kvm/loongarch/index.rst
> @@ -0,0 +1,10 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================
> +KVM for LoongArch systems
> +=========================
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   hypercalls.rst

Bibo Mao March 4, 2024, 9:10 a.m. UTC | #2

On 2024/3/2 下午5:41, WANG Xuerui wrote:
> On 3/2/24 16:47, Bibo Mao wrote:
>> Add documentation topic for using pv_virt when running as a guest
>> on KVM hypervisor.
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   Documentation/virt/kvm/index.rst              |  1 +
>>   .../virt/kvm/loongarch/hypercalls.rst         | 79 +++++++++++++++++++
>>   Documentation/virt/kvm/loongarch/index.rst    | 10 +++
>>   3 files changed, 90 insertions(+)
>>   create mode 100644 Documentation/virt/kvm/loongarch/hypercalls.rst
>>   create mode 100644 Documentation/virt/kvm/loongarch/index.rst
>>
>> diff --git a/Documentation/virt/kvm/index.rst 
>> b/Documentation/virt/kvm/index.rst
>> index ad13ec55ddfe..9ca5a45c2140 100644
>> --- a/Documentation/virt/kvm/index.rst
>> +++ b/Documentation/virt/kvm/index.rst
>> @@ -14,6 +14,7 @@ KVM
>>      s390/index
>>      ppc-pv
>>      x86/index
>> +   loongarch/index
>>      locking
>>      vcpu-requests
>> diff --git a/Documentation/virt/kvm/loongarch/hypercalls.rst 
>> b/Documentation/virt/kvm/loongarch/hypercalls.rst
>> new file mode 100644
>> index 000000000000..1679e48d67d2
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/loongarch/hypercalls.rst
>> @@ -0,0 +1,79 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===================================
>> +The LoongArch paravirtual interface
>> +===================================
>> +
>> +KVM hypercalls use the HVCL instruction with code 0x100, and the 
>> hypercall
>> +number is put in a0 and up to five arguments may be placed in a1-a5, the
>> +return value is placed in v0 (alias with a0).
> 
> Just say a0: the name v0 is long deprecated (has been the case ever 
> since LoongArch got mainlined).
> 
Sure, will modify since you are compiler export :)

>> +
>> +The code for that interface can be found in arch/loongarch/kvm/*
>> +
>> +Querying for existence
>> +======================
>> +
>> +To find out if we're running on KVM or not, cpucfg can be used with 
>> index
>> +CPUCFG_KVM_BASE (0x40000000), cpucfg range between 0x40000000 - 
>> 0x400000FF
>> +is marked as a specially reserved range. All existing and future 
>> processors
>> +will not implement any features in this range.
>> +
>> +When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE 
>> (0x40000000)
>> +returns magic string "KVM\0"
>> +
>> +Once you determined you're running under a PV capable KVM, you can 
>> now use
>> +hypercalls as described below.
> 
> So this is still the approach similar to the x86 CPUID-based 
> implementation. But here the non-privileged behavior isn't specified -- 
> I see there is PLV checking in Patch 3 but it's safer to have the 
> requirement spelled out here too.
> 
> But I still think this approach touches more places than strictly 
> needed. As it is currently the case in 
> arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for a 
> bit IOCSRF_VM that already signifies presence of a hypervisor; if this 
> information can be interpreted as availability of the HVCL instruction 
> (which I suppose is the case -- a hypervisor can always trap-and-emulate 
> in case HVCL isn't provided by hardware), here we can already start 
> making calls with HVCL.
> 
> We can and should define a uniform interface for probing the hypervisor 
> kind, similar to the centrally-managed RISC-V SBI implementation ID 
> registry [1]: otherwise future non-KVM hypervisors would have to
> 
> 1. somehow pretend they are KVM and eventually fail to do so, leading to 
> subtle incompatibilities,
> 2. invent another way of probing for their existence,
> 3. piggy-back on the current KVM definition, which is inelegant (reading 
> the LoongArch-KVM-defined CPUCFG leaf only to find it's not KVM) and 
> utterly makes the definition here *not* KVM-specific.
> 
> [1]: 
> https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc
> 
Sorry, I know nothing about riscv. Can you describe how sbi_get_mimpid() 
is implemented in detailed? Is it a simple library or need trap into 
secure mode or need trap into hypervisor mode?

> My take on this:
> 
> To check if we are running on Linux KVM or not, first check IOCSR 0x8 
> (``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
> running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` to 
> find out the hypervisor implementation ID; a return value in ``$a0`` of 
> 0x004d564b (``KVM\0``) means Linux KVM, in which case the rest of the 
> convention applies.
> 
I do not think so. `HVCL 0` requires that hypercall ABIs need be unified 
for all hypervisors. Instead it is not necessary, each hypervisor can 
has its own hypercall ABI.

>> +
>> +KVM hypercall ABI
>> +=================
>> +
>> +Hypercall ABI on KVM is simple, only one scratch register a0 (v0) and 
>> at most
>> +five generic registers used as input parameter. FP register and 
>> vector register
>> +is not used for input register and should not be modified during 
>> hypercall.
>> +Hypercall function can be inlined since there is only one scratch 
>> register.
> 
> It should be pointed out explicitly that on hypercall return all 
Well, return value description will added. What do think about the 
meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The number 
of CPUs with IPI delivered successfully like kvm x86 or simply 
success/failure?
> architectural state except ``$a0`` is preserved. Or is the whole ``$a0 - 
> $t8`` range clobbered, just like with Linux syscalls?
> 
what is advantage with $a0 - > $t8 clobbered?

It seems that with linux Loongarch syscall, t0--t8 are clobber rather 
than a0-t8. Am I wrong?

>> +
>> +The parameters are as follows:
>> +
>> +        ========    ================    ================
>> +    Register    IN            OUT
>> +        ========    ================    ================
>> +    a0        function number        Return code
>> +    a1        1st parameter        -
>> +    a2        2nd parameter        -
>> +    a3        3rd parameter        -
>> +    a4        4th parameter        -
>> +    a5        5th parameter        -
>> +        ========    ================    ================
>> +
>> +Return codes can be as follows:
>> +
>> +    ====        =========================
>> +    Code        Meaning
>> +    ====        =========================
>> +    0        Success
>> +    -1        Hypercall not implemented
>> +    -2        Hypercall parameter error
> 
> What about re-using well-known errno's, like -ENOSYS for "hypercall not 
> implemented" and -EINVAL for "invalid parameter"? This could save people 
> some hair when more error codes are added in the future.
> 
No, I do not think so. Here is hypercall return value, some OS need see 
it. -ENOSYS/-EINVAL may be not understandable for non-Linux OS.

>> +    ====        =========================
>> +
>> +KVM Hypercalls Documentation
>> +============================
>> +
>> +The template for each hypercall is:
>> +1. Hypercall name
>> +2. Purpose
>> +
>> +1. KVM_HCALL_FUNC_PV_IPI
>> +------------------------
>> +
>> +:Purpose: Send IPIs to multiple vCPUs.
>> +
>> +- a0: KVM_HCALL_FUNC_PV_IPI
>> +- a1: lower part of the bitmap of destination physical CPUIDs
>> +- a2: higher part of the bitmap of destination physical CPUIDs
>> +- a3: the lowest physical CPUID in bitmap
> 
> "CPU ID", instead of "CPUID" for clarity: I suppose most people reading 
> this also know about x86, so "CPUID" could evoke the wrong intuition.
> 
Both "CPU core id" or "CPUID" are ok for me since there is csr register 
named LOONGARCH_CSR_CPUID already.

> This function is equivalent to the C signature "void hypcall(int func, 
> u128 mask, int lowest_cpu_id)", which I think is fine, but one can also 
> see that the return value description is missing.
> 
Sure, the return value description will added.

And it is not equivalent to the C signature "void hypcall(int func, u128 
mask, int lowest_cpu_id)". int/u128/stucture is not permitted with 
hypercall ABI, all parameter is "unsigned long".

Regards
Bibo Mao

>> +
>> +The hypercall lets a guest send multicast IPIs, with at most 128
>> +destinations per hypercall.  The destinations are represented by a 
>> bitmap
>> +contained in the first two arguments (a1 and a2). Bit 0 of a1 
>> corresponds
>> +to the physical CPUID in the third argument (a3), bit 1 corresponds 
>> to the
>> +physical ID a3+1, and so on.
>> diff --git a/Documentation/virt/kvm/loongarch/index.rst 
>> b/Documentation/virt/kvm/loongarch/index.rst
>> new file mode 100644
>> index 000000000000..83387b4c5345
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/loongarch/index.rst
>> @@ -0,0 +1,10 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=========================
>> +KVM for LoongArch systems
>> +=========================
>> +
>> +.. toctree::
>> +   :maxdepth: 2
>> +
>> +   hypercalls.rst
>

WANG Xuerui March 5, 2024, 6:26 p.m. UTC | #3

On 3/4/24 17:10, maobibo wrote:
> On 2024/3/2 下午5:41, WANG Xuerui wrote:
>> On 3/2/24 16:47, Bibo Mao wrote:
>>> [snip]
>>> +Querying for existence
>>> +======================
>>> +
>>> +To find out if we're running on KVM or not, cpucfg can be used with 
>>> index
>>> +CPUCFG_KVM_BASE (0x40000000), cpucfg range between 0x40000000 - 
>>> 0x400000FF
>>> +is marked as a specially reserved range. All existing and future 
>>> processors
>>> +will not implement any features in this range.
>>> +
>>> +When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE 
>>> (0x40000000)
>>> +returns magic string "KVM\0"
>>> +
>>> +Once you determined you're running under a PV capable KVM, you can 
>>> now use
>>> +hypercalls as described below.
>>
>> So this is still the approach similar to the x86 CPUID-based 
>> implementation. But here the non-privileged behavior isn't specified 
>> -- I see there is PLV checking in Patch 3 but it's safer to have the 
>> requirement spelled out here too.
>>
>> But I still think this approach touches more places than strictly 
>> needed. As it is currently the case in 
>> arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for a 
>> bit IOCSRF_VM that already signifies presence of a hypervisor; if this 
>> information can be interpreted as availability of the HVCL instruction 
>> (which I suppose is the case -- a hypervisor can always 
>> trap-and-emulate in case HVCL isn't provided by hardware), here we can 
>> already start making calls with HVCL.
>>
>> We can and should define a uniform interface for probing the 
>> hypervisor kind, similar to the centrally-managed RISC-V SBI 
>> implementation ID registry [1]: otherwise future non-KVM hypervisors 
>> would have to
>>
>> 1. somehow pretend they are KVM and eventually fail to do so, leading 
>> to subtle incompatibilities,
>> 2. invent another way of probing for their existence,
>> 3. piggy-back on the current KVM definition, which is inelegant 
>> (reading the LoongArch-KVM-defined CPUCFG leaf only to find it's not 
>> KVM) and utterly makes the definition here *not* KVM-specific.
>>
>> [1]: 
>> https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc
>>
> Sorry, I know nothing about riscv. Can you describe how sbi_get_mimpid() 
> is implemented in detailed? Is it a simple library or need trap into 
> secure mode or need trap into hypervisor mode?

For these simple interfaces you can expect trivial implementation. See 
for example [OpenSBI]'s respective code.

[OpenSBI]: 
https://github.com/riscv-software-src/opensbi/blob/v1.4/lib/sbi/sbi_ecall.c#L29-L34

>> My take on this:
>>
>> To check if we are running on Linux KVM or not, first check IOCSR 0x8 
>> (``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
>> running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` 
>> to find out the hypervisor implementation ID; a return value in 
>> ``$a0`` of 0x004d564b (``KVM\0``) means Linux KVM, in which case the 
>> rest of the convention applies.
>>
> I do not think so. `HVCL 0` requires that hypercall ABIs need be unified 
> for all hypervisors. Instead it is not necessary, each hypervisor can 
> has its own hypercall ABI.

I don't think agreeing upon the ABI of HVCL 0 is going to affect ABI of 
other hypercalls. Plus, as long as people don't invent something that 
they think is smart and deviate from the platform calling convention, 
I'd expect every hypervisor to have identical ABI apart from the exact 
HVCL operation ID chosen.

>>> +
>>> +KVM hypercall ABI
>>> +=================
>>> +
>>> +Hypercall ABI on KVM is simple, only one scratch register a0 (v0) 
>>> and at most
>>> +five generic registers used as input parameter. FP register and 
>>> vector register
>>> +is not used for input register and should not be modified during 
>>> hypercall.
>>> +Hypercall function can be inlined since there is only one scratch 
>>> register.
>>
>> It should be pointed out explicitly that on hypercall return all 
> Well, return value description will added. What do think about the 
> meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The number 
> of CPUs with IPI delivered successfully like kvm x86 or simply 
> success/failure?
>> architectural state except ``$a0`` is preserved. Or is the whole ``$a0 
>> - $t8`` range clobbered, just like with Linux syscalls?
>>
> what is advantage with $a0 - > $t8 clobbered?

Because then a hypercall is going to behave identical as an ordinary C 
function call, which is easy for people and compilers to understand.

> It seems that with linux Loongarch syscall, t0--t8 are clobber rather 
> than a0-t8. Am I wrong?

You're right, my memory has faded a bit. But I think my reasoning still 
holds.

>>> +
>>> +The parameters are as follows:
>>> +
>>> +        ========    ================    ================
>>> +    Register    IN            OUT
>>> +        ========    ================    ================
>>> +    a0        function number        Return code
>>> +    a1        1st parameter        -
>>> +    a2        2nd parameter        -
>>> +    a3        3rd parameter        -
>>> +    a4        4th parameter        -
>>> +    a5        5th parameter        -
>>> +        ========    ================    ================
>>> +
>>> +Return codes can be as follows:
>>> +
>>> +    ====        =========================
>>> +    Code        Meaning
>>> +    ====        =========================
>>> +    0        Success
>>> +    -1        Hypercall not implemented
>>> +    -2        Hypercall parameter error
>>
>> What about re-using well-known errno's, like -ENOSYS for "hypercall 
>> not implemented" and -EINVAL for "invalid parameter"? This could save 
>> people some hair when more error codes are added in the future.
>>
> No, I do not think so. Here is hypercall return value, some OS need see 
> it. -ENOSYS/-EINVAL may be not understandable for non-Linux OS.

As long as you accept the associated costs (documentation, potential 
mapping back-and-forth, proper conveyance of information etc.) I have no 
problem with that either.

>>> +    ====        =========================
>>> +
>>> +KVM Hypercalls Documentation
>>> +============================
>>> +
>>> +The template for each hypercall is:
>>> +1. Hypercall name
>>> +2. Purpose
>>> +
>>> +1. KVM_HCALL_FUNC_PV_IPI
>>> +------------------------
>>> +
>>> +:Purpose: Send IPIs to multiple vCPUs.
>>> +
>>> +- a0: KVM_HCALL_FUNC_PV_IPI
>>> +- a1: lower part of the bitmap of destination physical CPUIDs
>>> +- a2: higher part of the bitmap of destination physical CPUIDs
>>> +- a3: the lowest physical CPUID in bitmap
>>
>> "CPU ID", instead of "CPUID" for clarity: I suppose most people 
>> reading this also know about x86, so "CPUID" could evoke the wrong 
>> intuition.
>>
> Both "CPU core id" or "CPUID" are ok for me since there is csr register 
> named LOONGARCH_CSR_CPUID already.

I was suggesting to minimize confusion even at theoretical level, 
because you cannot assume anything about your readers. Feel free to 
provide extra info (e.g. the "CPU core ID" you suggested) as long as it 
helps to resolve any potential ambiguity / confusion.

>> This function is equivalent to the C signature "void hypcall(int func, 
>> u128 mask, int lowest_cpu_id)", which I think is fine, but one can 
>> also see that the return value description is missing.
>>
> Sure, the return value description will added.
> 
> And it is not equivalent to the C signature "void hypcall(int func, u128 
> mask, int lowest_cpu_id)". int/u128/stucture is not permitted with 
> hypercall ABI, all parameter is "unsigned long".

I was talking about the ABI in a C perspective, and the register usage 
is identical. You can define the KVM hypercall ABI however you want but 
having some nice analogy/equivalence would help a lot, especially for 
people not already familiar with all the details.

Bibo Mao March 6, 2024, 3:28 a.m. UTC | #4

On 2024/3/6 上午2:26, WANG Xuerui wrote:
> On 3/4/24 17:10, maobibo wrote:
>> On 2024/3/2 下午5:41, WANG Xuerui wrote:
>>> On 3/2/24 16:47, Bibo Mao wrote:
>>>> [snip]
>>>> +Querying for existence
>>>> +======================
>>>> +
>>>> +To find out if we're running on KVM or not, cpucfg can be used with 
>>>> index
>>>> +CPUCFG_KVM_BASE (0x40000000), cpucfg range between 0x40000000 - 
>>>> 0x400000FF
>>>> +is marked as a specially reserved range. All existing and future 
>>>> processors
>>>> +will not implement any features in this range.
>>>> +
>>>> +When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE 
>>>> (0x40000000)
>>>> +returns magic string "KVM\0"
>>>> +
>>>> +Once you determined you're running under a PV capable KVM, you can 
>>>> now use
>>>> +hypercalls as described below.
>>>
>>> So this is still the approach similar to the x86 CPUID-based 
>>> implementation. But here the non-privileged behavior isn't specified 
>>> -- I see there is PLV checking in Patch 3 but it's safer to have the 
>>> requirement spelled out here too.
>>>
>>> But I still think this approach touches more places than strictly 
>>> needed. As it is currently the case in 
>>> arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for 
>>> a bit IOCSRF_VM that already signifies presence of a hypervisor; if 
>>> this information can be interpreted as availability of the HVCL 
>>> instruction (which I suppose is the case -- a hypervisor can always 
>>> trap-and-emulate in case HVCL isn't provided by hardware), here we 
>>> can already start making calls with HVCL.
>>>
>>> We can and should define a uniform interface for probing the 
>>> hypervisor kind, similar to the centrally-managed RISC-V SBI 
>>> implementation ID registry [1]: otherwise future non-KVM hypervisors 
>>> would have to
>>>
>>> 1. somehow pretend they are KVM and eventually fail to do so, leading 
>>> to subtle incompatibilities,
>>> 2. invent another way of probing for their existence,
>>> 3. piggy-back on the current KVM definition, which is inelegant 
>>> (reading the LoongArch-KVM-defined CPUCFG leaf only to find it's not 
>>> KVM) and utterly makes the definition here *not* KVM-specific.
>>>
>>> [1]: 
>>> https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc 
>>>
>>>
>> Sorry, I know nothing about riscv. Can you describe how 
>> sbi_get_mimpid() is implemented in detailed? Is it a simple library or 
>> need trap into secure mode or need trap into hypervisor mode?
> 
> For these simple interfaces you can expect trivial implementation. See 
> for example [OpenSBI]'s respective code.
> 
> [OpenSBI]: 
> https://github.com/riscv-software-src/opensbi/blob/v1.4/lib/sbi/sbi_ecall.c#L29-L34 
> 
> 
>>> My take on this:
>>>
>>> To check if we are running on Linux KVM or not, first check IOCSR 0x8 
>>> (``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
>>> running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` 
>>> to find out the hypervisor implementation ID; a return value in 
>>> ``$a0`` of 0x004d564b (``KVM\0``) means Linux KVM, in which case the 
>>> rest of the convention applies.
>>>
>> I do not think so. `HVCL 0` requires that hypercall ABIs need be 
>> unified for all hypervisors. Instead it is not necessary, each 
>> hypervisor can has its own hypercall ABI.
> 
> I don't think agreeing upon the ABI of HVCL 0 is going to affect ABI of 
> other hypercalls. Plus, as long as people don't invent something that 
> they think is smart and deviate from the platform calling convention, 
> I'd expect every hypervisor to have identical ABI apart from the exact 
> HVCL operation ID chosen.
> 
>>>> +
>>>> +KVM hypercall ABI
>>>> +=================
>>>> +
>>>> +Hypercall ABI on KVM is simple, only one scratch register a0 (v0) 
>>>> and at most
>>>> +five generic registers used as input parameter. FP register and 
>>>> vector register
>>>> +is not used for input register and should not be modified during 
>>>> hypercall.
>>>> +Hypercall function can be inlined since there is only one scratch 
>>>> register.
>>>
>>> It should be pointed out explicitly that on hypercall return all 
>> Well, return value description will added. What do think about the 
>> meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The 
>> number of CPUs with IPI delivered successfully like kvm x86 or simply 
>> success/failure?
>>> architectural state except ``$a0`` is preserved. Or is the whole 
>>> ``$a0 - $t8`` range clobbered, just like with Linux syscalls?
>>>
>> what is advantage with $a0 - > $t8 clobbered?
> 
> Because then a hypercall is going to behave identical as an ordinary C 
> function call, which is easy for people and compilers to understand.
> 
If you really understand detailed behavior about hypercall/syscall, the 
conclusion may be different.

If T0 - T8 is clobbered with hypercall instruction, hypercall caller 
need save clobbered register, now hypercall exception save/restore all 
the registers during VM exits. If so, hypercall caller need not save 
general registers and it is not necessary scratched for hypercall ABI.

Until now all the discussion the macro level, no detail code level.

Can you show me some example code where T0-T8 need not save/restore 
during LoongArch hypercall exception?

Regards
Bibo Mao

>> It seems that with linux Loongarch syscall, t0--t8 are clobber rather 
>> than a0-t8. Am I wrong?
> 
> You're right, my memory has faded a bit. But I think my reasoning still 
> holds.
> 
>>>> +
>>>> +The parameters are as follows:
>>>> +
>>>> +        ========    ================    ================
>>>> +    Register    IN            OUT
>>>> +        ========    ================    ================
>>>> +    a0        function number        Return code
>>>> +    a1        1st parameter        -
>>>> +    a2        2nd parameter        -
>>>> +    a3        3rd parameter        -
>>>> +    a4        4th parameter        -
>>>> +    a5        5th parameter        -
>>>> +        ========    ================    ================
>>>> +
>>>> +Return codes can be as follows:
>>>> +
>>>> +    ====        =========================
>>>> +    Code        Meaning
>>>> +    ====        =========================
>>>> +    0        Success
>>>> +    -1        Hypercall not implemented
>>>> +    -2        Hypercall parameter error
>>>
>>> What about re-using well-known errno's, like -ENOSYS for "hypercall 
>>> not implemented" and -EINVAL for "invalid parameter"? This could save 
>>> people some hair when more error codes are added in the future.
>>>
>> No, I do not think so. Here is hypercall return value, some OS need 
>> see it. -ENOSYS/-EINVAL may be not understandable for non-Linux OS.
> 
> As long as you accept the associated costs (documentation, potential 
> mapping back-and-forth, proper conveyance of information etc.) I have no 
> problem with that either.
> 
>>>> +    ====        =========================
>>>> +
>>>> +KVM Hypercalls Documentation
>>>> +============================
>>>> +
>>>> +The template for each hypercall is:
>>>> +1. Hypercall name
>>>> +2. Purpose
>>>> +
>>>> +1. KVM_HCALL_FUNC_PV_IPI
>>>> +------------------------
>>>> +
>>>> +:Purpose: Send IPIs to multiple vCPUs.
>>>> +
>>>> +- a0: KVM_HCALL_FUNC_PV_IPI
>>>> +- a1: lower part of the bitmap of destination physical CPUIDs
>>>> +- a2: higher part of the bitmap of destination physical CPUIDs
>>>> +- a3: the lowest physical CPUID in bitmap
>>>
>>> "CPU ID", instead of "CPUID" for clarity: I suppose most people 
>>> reading this also know about x86, so "CPUID" could evoke the wrong 
>>> intuition.
>>>
>> Both "CPU core id" or "CPUID" are ok for me since there is csr 
>> register named LOONGARCH_CSR_CPUID already.
> 
> I was suggesting to minimize confusion even at theoretical level, 
> because you cannot assume anything about your readers. Feel free to 
> provide extra info (e.g. the "CPU core ID" you suggested) as long as it 
> helps to resolve any potential ambiguity / confusion.
> 
>>> This function is equivalent to the C signature "void hypcall(int 
>>> func, u128 mask, int lowest_cpu_id)", which I think is fine, but one 
>>> can also see that the return value description is missing.
>>>
>> Sure, the return value description will added.
>>
>> And it is not equivalent to the C signature "void hypcall(int func, 
>> u128 mask, int lowest_cpu_id)". int/u128/stucture is not permitted 
>> with hypercall ABI, all parameter is "unsigned long".
> 
> I was talking about the ABI in a C perspective, and the register usage 
> is identical. You can define the KVM hypercall ABI however you want but 
> having some nice analogy/equivalence would help a lot, especially for 
> people not already familiar with all the details.
>

WANG Xuerui March 6, 2024, 9:22 a.m. UTC | #5

On 3/6/24 11:28, maobibo wrote:
> On 2024/3/6 上午2:26, WANG Xuerui wrote:
>> On 3/4/24 17:10, maobibo wrote:
>>> On 2024/3/2 下午5:41, WANG Xuerui wrote:
>>>> On 3/2/24 16:47, Bibo Mao wrote:
>>>>> [snip]
>>>>> +
>>>>> +KVM hypercall ABI
>>>>> +=================
>>>>> +
>>>>> +Hypercall ABI on KVM is simple, only one scratch register a0 (v0) 
>>>>> and at most
>>>>> +five generic registers used as input parameter. FP register and 
>>>>> vector register
>>>>> +is not used for input register and should not be modified during 
>>>>> hypercall.
>>>>> +Hypercall function can be inlined since there is only one scratch 
>>>>> register.
>>>>
>>>> It should be pointed out explicitly that on hypercall return all 
>>> Well, return value description will added. What do think about the 
>>> meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The 
>>> number of CPUs with IPI delivered successfully like kvm x86 or simply 
>>> success/failure?

I just noticed I've forgotten to comment on this question. FYI, RISC-V 
SBI's equivalent [1] doesn't even indicate errors. And from my 
perspective, we can always add a new hypercall returning more info 
should that info is needed in the future; for now I don't have a problem 
whether the return type is void, bool or number of CPUs that are 
successfully reached.

[1]: 
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-ipi.adoc

>>>> architectural state except ``$a0`` is preserved. Or is the whole 
>>>> ``$a0 - $t8`` range clobbered, just like with Linux syscalls?
>>>>
>>> what is advantage with $a0 - > $t8 clobbered?
>>
>> Because then a hypercall is going to behave identical as an ordinary C 
>> function call, which is easy for people and compilers to understand.
>>
> If you really understand detailed behavior about hypercall/syscall, the 
> conclusion may be different.
> 
> If T0 - T8 is clobbered with hypercall instruction, hypercall caller 
> need save clobbered register, now hypercall exception save/restore all 
> the registers during VM exits. If so, hypercall caller need not save 
> general registers and it is not necessary scratched for hypercall ABI.
> 
> Until now all the discussion the macro level, no detail code level.
> 
> Can you show me some example code where T0-T8 need not save/restore 
> during LoongArch hypercall exception?

I was emphasizing that consistency is generally good, and yes that's 
"macroscopic" level talk. Of course, the hypercall client code would 
have to do *less* work if *more* registers than the minimum are 
preserved -- if right now everything is already preserved, nothing needs 
to change.

But please also notice that the context switch cost is paid for every 
hypercall, and we can't reduce the number of preserved registers without 
breaking compatibility. So I think we can keep the current 
implementation behavior, but promise less in the spec: this way we'll 
keep the possibility of reducing the context switch overhead.

[v6,7/7] Documentation: KVM: Add hypercall for LoongArch

Commit Message

Comments

Patch