[v7,3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor

Message ID	20240315080710.2812974-4-maobibo@loongson.cn (mailing list archive)
State	New, archived
Headers	show Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BE3D914A9D; Fri, 15 Mar 2024 08:07:22 +0000 (UTC) From: Bibo Mao <maobibo@loongson.cn> To: Huacai Chen <chenhuacai@kernel.org>, Tianrui Zhao <zhaotianrui@loongson.cn>, WANG Xuerui <kernel@xen0n.name>, Juergen Gross <jgross@suse.com>, Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net> Cc: loongarch@lists.linux.dev, linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH v7 3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor Date: Fri, 15 Mar 2024 16:07:06 +0800 Message-Id: <20240315080710.2812974-4-maobibo@loongson.cn> In-Reply-To: <20240315080710.2812974-1-maobibo@loongson.cn> References: <20240315080710.2812974-1-maobibo@loongson.cn> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	LoongArch: Add pv ipi support on LoongArch VM \| expand [v7,0/7] LoongArch: Add pv ipi support on LoongArch VM [v7,1/7] LoongArch/smp: Refine some ipi functions on LoongArch platform [v7,2/7] LoongArch: KVM: Add hypercall instruction emulation support [v7,3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor [v7,4/7] LoongArch: KVM: Add vcpu search support from physical cpuid [v7,5/7] LoongArch: KVM: Add pv ipi support on kvm side [v7,6/7] LoongArch: Add pv ipi support on guest kernel side [v7,7/7] Documentation: KVM: Add hypercall for LoongArch

Bibo Mao March 15, 2024, 8:07 a.m. UTC

Instruction cpucfg can be used to get processor features. And there
is trap exception when it is executed in VM mode, and also it is
to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used.  Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to privide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for
real HW, it is only used by software.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/include/asm/inst.h      |  1 +
 arch/loongarch/include/asm/loongarch.h | 10 +++++
 arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
 3 files changed, 54 insertions(+), 16 deletions(-)

WANG Xuerui March 23, 2024, 7:02 p.m. UTC | #1

On 3/15/24 16:07, Bibo Mao wrote:
> Instruction cpucfg can be used to get processor features. And there
> is trap exception when it is executed in VM mode, and also it is
> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> is used.  Here one specified area 0x40000000 -- 0x400000ff is used
> for KVM hypervisor to privide PV features, and the area can be extended
> for other hypervisors in future. This area will never be used for
> real HW, it is only used by software.
> 
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>   arch/loongarch/include/asm/inst.h      |  1 +
>   arch/loongarch/include/asm/loongarch.h | 10 +++++
>   arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
>   3 files changed, 54 insertions(+), 16 deletions(-)
> 

Sorry for the late reply, but I think it may be a bit non-constructive 
to repeatedly submit the same code without due explanation in our 
previous review threads. Let me try to recollect some of the details 
though...

If I remember correctly, during the previous reviews, it was mentioned 
that the only upsides of using CPUCFG were:

- it was exactly identical to the x86 approach,
- it would not require access to the LoongArch Reference Manual Volume 3 
to use, and
- it was plain old data.

But, for the first point, we don't have to follow x86 convention after 
all. The second reason might be compelling, but on the one hand that's 
another problem orthogonal to the current one, and on the other hand 
HVCL is:

- already effectively public because of the fact that this very patchset 
is public,
- its semantics is trivial to implement even without access to the LVZ 
manual, because of its striking similarity with SYSCALL, and
- by being a function call, we reserve the possibility for hypervisors 
to invoke logic for self-identification purposes, even if this is likely 
overkill from today's perspective.

And, even if we decide that using HVCL for self-identification is 
overkill after all, we still have another choice that's IOCSR. We 
already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) 
to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that 
we put the identification word in the IOCSR space. As far as I can see, 
the IOCSR space is plenty and equally available for making reservations; 
it can only be even easier when it's done by a Loongson team.

Finally, I've mentioned multiple times, that varying CPUCFG behavior 
based on PLV is not something well documented on the manuals, hence not 
friendly to low-level developers. Devs of third-party firmware and/or 
kernels do exist, I've personally spoken to some of them on the 
2023-11-18 3A6000 release event; in order for the varying CPUCFG 
behavior approach to pass for me, at the very least, the LoongArch 
reference manual must be amended to explicitly include an explanation of 
it, and a reference to potential use cases.

Bibo Mao April 2, 2024, 1:43 a.m. UTC | #2

On 2024/3/24 上午3:02, WANG Xuerui wrote:
> On 3/15/24 16:07, Bibo Mao wrote:
>> Instruction cpucfg can be used to get processor features. And there
>> is trap exception when it is executed in VM mode, and also it is
>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>> is used.  Here one specified area 0x40000000 -- 0x400000ff is used
>> for KVM hypervisor to privide PV features, and the area can be extended
>> for other hypervisors in future. This area will never be used for
>> real HW, it is only used by software.
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   arch/loongarch/include/asm/inst.h      |  1 +
>>   arch/loongarch/include/asm/loongarch.h | 10 +++++
>>   arch/loongarch/kvm/exit.c              | 59 +++++++++++++++++++-------
>>   3 files changed, 54 insertions(+), 16 deletions(-)
>>
> 
> Sorry for the late reply, but I think it may be a bit non-constructive 
> to repeatedly submit the same code without due explanation in our 
> previous review threads. Let me try to recollect some of the details 
> though...
Because your review comments about hypercall method is wrong, I need not 
adopt it.
> 
> If I remember correctly, during the previous reviews, it was mentioned 
> that the only upsides of using CPUCFG were:
> 
> - it was exactly identical to the x86 approach,
> - it would not require access to the LoongArch Reference Manual Volume 3 
> to use, and
> - it was plain old data.
> 
> But, for the first point, we don't have to follow x86 convention after 
X86 virtualization is successfully and widely applied in our life and 
products. It it normal to follow it if there is not obvious issues.

> all. The second reason might be compelling, but on the one hand that's 
> another problem orthogonal to the current one, and on the other hand 
> HVCL is:
> 
> - already effectively public because of the fact that this very patchset 
> is public,
> - its semantics is trivial to implement even without access to the LVZ 
> manual, because of its striking similarity with SYSCALL, and
> - by being a function call, we reserve the possibility for hypervisors 
> to invoke logic for self-identification purposes, even if this is likely 
> overkill from today's perspective.
> 
> And, even if we decide that using HVCL for self-identification is 
> overkill after all, we still have another choice that's IOCSR. We 
> already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) 
> to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that 
> we put the identification word in the IOCSR space. As far as I can see, 
> the IOCSR space is plenty and equally available for making reservations; 
> it can only be even easier when it's done by a Loongson team.
IOCSR method is possible also, about chip design CPUCFG is used for cpu 
features and IOCSR is for device featurs. Here CPUCFG method is 
selected, I am KVM LoongArch maintainer and I can decide to select 
methods if the method works well. Is that right?

If you are interested in KVM LoongArch, you can submit more patches and 
become maintainer or write new hypervisor support such xen/xvisor etc, 
and use your method.

Also you are interested in Linux kernel, there are some issues. Can you 
help to improve it?

1. T0-T7 are scratch registers during SYSCALL ABI, this is what you 
suggest, does there exist information leaking to user space from T0-T7 
registers?

2. LoongArch KVM depends on AS_HAS_LVZ_EXTENSION, which requires the 
latest binutils. It is also what you suggest. Some kernel developers 
does not have the latest binutils and common kvm code is modified and 
LoongArch KVM fails to compile. But they can not find it since their 
LoongArch cross-compile is old and LoongArch KVM is disabled. This issue 
can be found at https://lkml.org/lkml/2023/11/15/828.

Regards
Bibo Mao
> 
> Finally, I've mentioned multiple times, that varying CPUCFG behavior 
> based on PLV is not something well documented on the manuals, hence not 
> friendly to low-level developers. Devs of third-party firmware and/or 
> kernels do exist, I've personally spoken to some of them on the 
> 2023-11-18 3A6000 release event; in order for the varying CPUCFG 
> behavior approach to pass for me, at the very least, the LoongArch 
> reference manual must be amended to explicitly include an explanation of 
> it, and a reference to potential use cases.
>

Xi Ruoyao April 2, 2024, 2:49 a.m. UTC | #3

On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
> > Sorry for the late reply, but I think it may be a bit non-constructive 
> > to repeatedly submit the same code without due explanation in our 
> > previous review threads. Let me try to recollect some of the details
> > though...
> Because your review comments about hypercall method is wrong, I need not 
> adopt it.

Again it's unfair to say so considering the lack of LVZ documentation.

/* snip */

> 
> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you 
> suggest, does there exist information leaking to user space from T0-T7
> registers?

It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
despite T0-T7 are not saved.  So a "junk" value will be read from the
leading PT_SIZE bytes of the kernel stack for this thread.

The leading PT_SIZE bytes of the kernel stack is dedicated for storing
the struct pt_regs representing the reg file of the thread in the
userspace.

Thus we may only read out the userspace T0-T7 value stored when the same
thread was interrupted or trapped last time, or 0 (if the thread was
never interrupted or trapped before).

And it's impossible to read some data used by the kernel internally, or
some data of another thread.

But indeed there is some improvement here.  Zeroing these registers
seems cleaner than reading out the junk values, and also faster (move
$t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
to violate Huacai's "keep things simple" aspiration though.

Bibo Mao April 2, 2024, 3:04 a.m. UTC | #4

On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
> 
> Again it's unfair to say so considering the lack of LVZ documentation.
> 
> /* snip */
> 
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
> 
> It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved.  So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
For you it is "junk" value, some guys maybe thinks it is useful.

There is another issue, since kernel restore T0-T7 registers and user 
space save T0-T7. Why T0-T7 is scratch registers rather than preserve 
registers like other architecture? What is the advantage if it is 
scratch registers?

Regards
Bibo Mao
> 
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
> 
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
> 
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
> 
> But indeed there is some improvement here.  Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>

Bibo Mao April 2, 2024, 3:34 a.m. UTC | #5

On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
> 
> Again it's unfair to say so considering the lack of LVZ documentation.
> 
> /* snip */
> 
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
> 
> It's not a problem.  When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved.  So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
> 
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
Not all syscalls use leading PT_SIZE bytes of the kernel stack. It is 
complicated if syscall is combined with interrupt and singals.

> 
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
> 
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
Are you sure that it's impossible to read some data used by the kernel 
internally?

Regards
Bibo Mao
> 
> But indeed there is some improvement here.  Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12).  Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>

Xi Ruoyao April 2, 2024, 5:34 a.m. UTC | #6

On Tue, 2024-04-02 at 11:34 +0800, maobibo wrote:

> Are you sure that it's impossible to read some data used by the kernel
> internally?

Yes.

> There is another issue, since kernel restore T0-T7 registers and user
> space save T0-T7. Why T0-T7 is scratch registers rather than preserve
> registers like other architecture? What is the advantage if it is
> scratch registers?

I'd say "MIPS legacy."  Note that MIPS also does not preserve temp
registers, and MIPS does not have the "info leak" issue as well (or it
should have been assigned a CVE, in all these years).

I do agree maybe it's the time to move away from MIPS legacy and be more
similar to RISC-V etc now...

In Glibc we can condition __SYSCALL_CLOBBERS with #if
__LINUX_KERNEL_VERSION > xxxxxxx to take the advantage.

Huacai, Xuerui, how do you think?

[v7,3/7] LoongArch: KVM: Add cpucfg area for kvm hypervisor

Commit Message

Comments

Patch