Message ID | 20240315080710.2812974-4-maobibo@loongson.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | LoongArch: Add pv ipi support on LoongArch VM | expand |
On 3/15/24 16:07, Bibo Mao wrote: > Instruction cpucfg can be used to get processor features. And there > is trap exception when it is executed in VM mode, and also it is > to provide cpu features to VM. On real hardware cpucfg area 0 - 20 > is used. Here one specified area 0x40000000 -- 0x400000ff is used > for KVM hypervisor to privide PV features, and the area can be extended > for other hypervisors in future. This area will never be used for > real HW, it is only used by software. > > Signed-off-by: Bibo Mao <maobibo@loongson.cn> > --- > arch/loongarch/include/asm/inst.h | 1 + > arch/loongarch/include/asm/loongarch.h | 10 +++++ > arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++------- > 3 files changed, 54 insertions(+), 16 deletions(-) > Sorry for the late reply, but I think it may be a bit non-constructive to repeatedly submit the same code without due explanation in our previous review threads. Let me try to recollect some of the details though... If I remember correctly, during the previous reviews, it was mentioned that the only upsides of using CPUCFG were: - it was exactly identical to the x86 approach, - it would not require access to the LoongArch Reference Manual Volume 3 to use, and - it was plain old data. But, for the first point, we don't have to follow x86 convention after all. The second reason might be compelling, but on the one hand that's another problem orthogonal to the current one, and on the other hand HVCL is: - already effectively public because of the fact that this very patchset is public, - its semantics is trivial to implement even without access to the LVZ manual, because of its striking similarity with SYSCALL, and - by being a function call, we reserve the possibility for hypervisors to invoke logic for self-identification purposes, even if this is likely overkill from today's perspective. And, even if we decide that using HVCL for self-identification is overkill after all, we still have another choice that's IOCSR. We already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that we put the identification word in the IOCSR space. As far as I can see, the IOCSR space is plenty and equally available for making reservations; it can only be even easier when it's done by a Loongson team. Finally, I've mentioned multiple times, that varying CPUCFG behavior based on PLV is not something well documented on the manuals, hence not friendly to low-level developers. Devs of third-party firmware and/or kernels do exist, I've personally spoken to some of them on the 2023-11-18 3A6000 release event; in order for the varying CPUCFG behavior approach to pass for me, at the very least, the LoongArch reference manual must be amended to explicitly include an explanation of it, and a reference to potential use cases.
On 2024/3/24 上午3:02, WANG Xuerui wrote: > On 3/15/24 16:07, Bibo Mao wrote: >> Instruction cpucfg can be used to get processor features. And there >> is trap exception when it is executed in VM mode, and also it is >> to provide cpu features to VM. On real hardware cpucfg area 0 - 20 >> is used. Here one specified area 0x40000000 -- 0x400000ff is used >> for KVM hypervisor to privide PV features, and the area can be extended >> for other hypervisors in future. This area will never be used for >> real HW, it is only used by software. >> >> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >> --- >> arch/loongarch/include/asm/inst.h | 1 + >> arch/loongarch/include/asm/loongarch.h | 10 +++++ >> arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++------- >> 3 files changed, 54 insertions(+), 16 deletions(-) >> > > Sorry for the late reply, but I think it may be a bit non-constructive > to repeatedly submit the same code without due explanation in our > previous review threads. Let me try to recollect some of the details > though... Because your review comments about hypercall method is wrong, I need not adopt it. > > If I remember correctly, during the previous reviews, it was mentioned > that the only upsides of using CPUCFG were: > > - it was exactly identical to the x86 approach, > - it would not require access to the LoongArch Reference Manual Volume 3 > to use, and > - it was plain old data. > > But, for the first point, we don't have to follow x86 convention after X86 virtualization is successfully and widely applied in our life and products. It it normal to follow it if there is not obvious issues. > all. The second reason might be compelling, but on the one hand that's > another problem orthogonal to the current one, and on the other hand > HVCL is: > > - already effectively public because of the fact that this very patchset > is public, > - its semantics is trivial to implement even without access to the LVZ > manual, because of its striking similarity with SYSCALL, and > - by being a function call, we reserve the possibility for hypervisors > to invoke logic for self-identification purposes, even if this is likely > overkill from today's perspective. > > And, even if we decide that using HVCL for self-identification is > overkill after all, we still have another choice that's IOCSR. We > already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM) > to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that > we put the identification word in the IOCSR space. As far as I can see, > the IOCSR space is plenty and equally available for making reservations; > it can only be even easier when it's done by a Loongson team. IOCSR method is possible also, about chip design CPUCFG is used for cpu features and IOCSR is for device featurs. Here CPUCFG method is selected, I am KVM LoongArch maintainer and I can decide to select methods if the method works well. Is that right? If you are interested in KVM LoongArch, you can submit more patches and become maintainer or write new hypervisor support such xen/xvisor etc, and use your method. Also you are interested in Linux kernel, there are some issues. Can you help to improve it? 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you suggest, does there exist information leaking to user space from T0-T7 registers? 2. LoongArch KVM depends on AS_HAS_LVZ_EXTENSION, which requires the latest binutils. It is also what you suggest. Some kernel developers does not have the latest binutils and common kvm code is modified and LoongArch KVM fails to compile. But they can not find it since their LoongArch cross-compile is old and LoongArch KVM is disabled. This issue can be found at https://lkml.org/lkml/2023/11/15/828. Regards Bibo Mao > > Finally, I've mentioned multiple times, that varying CPUCFG behavior > based on PLV is not something well documented on the manuals, hence not > friendly to low-level developers. Devs of third-party firmware and/or > kernels do exist, I've personally spoken to some of them on the > 2023-11-18 3A6000 release event; in order for the varying CPUCFG > behavior approach to pass for me, at the very least, the LoongArch > reference manual must be amended to explicitly include an explanation of > it, and a reference to potential use cases. >
On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote: > > Sorry for the late reply, but I think it may be a bit non-constructive > > to repeatedly submit the same code without due explanation in our > > previous review threads. Let me try to recollect some of the details > > though... > Because your review comments about hypercall method is wrong, I need not > adopt it. Again it's unfair to say so considering the lack of LVZ documentation. /* snip */ > > 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you > suggest, does there exist information leaking to user space from T0-T7 > registers? It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked despite T0-T7 are not saved. So a "junk" value will be read from the leading PT_SIZE bytes of the kernel stack for this thread. The leading PT_SIZE bytes of the kernel stack is dedicated for storing the struct pt_regs representing the reg file of the thread in the userspace. Thus we may only read out the userspace T0-T7 value stored when the same thread was interrupted or trapped last time, or 0 (if the thread was never interrupted or trapped before). And it's impossible to read some data used by the kernel internally, or some data of another thread. But indeed there is some improvement here. Zeroing these registers seems cleaner than reading out the junk values, and also faster (move $t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy to violate Huacai's "keep things simple" aspiration though.
On 2024/4/2 上午10:49, Xi Ruoyao wrote: > On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote: >>> Sorry for the late reply, but I think it may be a bit non-constructive >>> to repeatedly submit the same code without due explanation in our >>> previous review threads. Let me try to recollect some of the details >>> though... >> Because your review comments about hypercall method is wrong, I need not >> adopt it. > > Again it's unfair to say so considering the lack of LVZ documentation. > > /* snip */ > >> >> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you >> suggest, does there exist information leaking to user space from T0-T7 >> registers? > > It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked > despite T0-T7 are not saved. So a "junk" value will be read from the > leading PT_SIZE bytes of the kernel stack for this thread. For you it is "junk" value, some guys maybe thinks it is useful. There is another issue, since kernel restore T0-T7 registers and user space save T0-T7. Why T0-T7 is scratch registers rather than preserve registers like other architecture? What is the advantage if it is scratch registers? Regards Bibo Mao > > The leading PT_SIZE bytes of the kernel stack is dedicated for storing > the struct pt_regs representing the reg file of the thread in the > userspace. > > Thus we may only read out the userspace T0-T7 value stored when the same > thread was interrupted or trapped last time, or 0 (if the thread was > never interrupted or trapped before). > > And it's impossible to read some data used by the kernel internally, or > some data of another thread. > > But indeed there is some improvement here. Zeroing these registers > seems cleaner than reading out the junk values, and also faster (move > $t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy > to violate Huacai's "keep things simple" aspiration though. >
On 2024/4/2 上午10:49, Xi Ruoyao wrote: > On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote: >>> Sorry for the late reply, but I think it may be a bit non-constructive >>> to repeatedly submit the same code without due explanation in our >>> previous review threads. Let me try to recollect some of the details >>> though... >> Because your review comments about hypercall method is wrong, I need not >> adopt it. > > Again it's unfair to say so considering the lack of LVZ documentation. > > /* snip */ > >> >> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you >> suggest, does there exist information leaking to user space from T0-T7 >> registers? > > It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked > despite T0-T7 are not saved. So a "junk" value will be read from the > leading PT_SIZE bytes of the kernel stack for this thread. > > The leading PT_SIZE bytes of the kernel stack is dedicated for storing > the struct pt_regs representing the reg file of the thread in the > userspace. Not all syscalls use leading PT_SIZE bytes of the kernel stack. It is complicated if syscall is combined with interrupt and singals. > > Thus we may only read out the userspace T0-T7 value stored when the same > thread was interrupted or trapped last time, or 0 (if the thread was > never interrupted or trapped before). > > And it's impossible to read some data used by the kernel internally, or > some data of another thread. Are you sure that it's impossible to read some data used by the kernel internally? Regards Bibo Mao > > But indeed there is some improvement here. Zeroing these registers > seems cleaner than reading out the junk values, and also faster (move > $t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy > to violate Huacai's "keep things simple" aspiration though. >
On Tue, 2024-04-02 at 11:34 +0800, maobibo wrote: > Are you sure that it's impossible to read some data used by the kernel > internally? Yes. > There is another issue, since kernel restore T0-T7 registers and user > space save T0-T7. Why T0-T7 is scratch registers rather than preserve > registers like other architecture? What is the advantage if it is > scratch registers? I'd say "MIPS legacy." Note that MIPS also does not preserve temp registers, and MIPS does not have the "info leak" issue as well (or it should have been assigned a CVE, in all these years). I do agree maybe it's the time to move away from MIPS legacy and be more similar to RISC-V etc now... In Glibc we can condition __SYSCALL_CLOBBERS with #if __LINUX_KERNEL_VERSION > xxxxxxx to take the advantage. Huacai, Xuerui, how do you think?
diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h index d8f637f9e400..ad120f924905 100644 --- a/arch/loongarch/include/asm/inst.h +++ b/arch/loongarch/include/asm/inst.h @@ -67,6 +67,7 @@ enum reg2_op { revhd_op = 0x11, extwh_op = 0x16, extwb_op = 0x17, + cpucfg_op = 0x1b, iocsrrdb_op = 0x19200, iocsrrdh_op = 0x19201, iocsrrdw_op = 0x19202, diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h index 46366e783c84..a1d22e8b6f94 100644 --- a/arch/loongarch/include/asm/loongarch.h +++ b/arch/loongarch/include/asm/loongarch.h @@ -158,6 +158,16 @@ #define CPUCFG48_VFPU_CG BIT(2) #define CPUCFG48_RAM_CG BIT(3) +/* + * cpucfg index area: 0x40000000 -- 0x400000ff + * SW emulation for KVM hypervirsor + */ +#define CPUCFG_KVM_BASE 0x40000000UL +#define CPUCFG_KVM_SIZE 0x100 +#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE +#define KVM_SIGNATURE "KVM\0" +#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4) + #ifndef __ASSEMBLY__ /* CSR */ diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c index 923bbca9bd22..a8d3b652d3ea 100644 --- a/arch/loongarch/kvm/exit.c +++ b/arch/loongarch/kvm/exit.c @@ -206,10 +206,50 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu) return EMULATE_DONE; } -static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu) +static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst) { int rd, rj; unsigned int index; + unsigned long plv; + + rd = inst.reg2_format.rd; + rj = inst.reg2_format.rj; + ++vcpu->stat.cpucfg_exits; + index = vcpu->arch.gprs[rj]; + + /* + * By LoongArch Reference Manual 2.2.10.5 + * Return value is 0 for undefined cpucfg index + * + * Disable preemption since hw gcsr is accessed + */ + preempt_disable(); + plv = kvm_read_hw_gcsr(LOONGARCH_CSR_CRMD) >> CSR_CRMD_PLV_SHIFT; + switch (index) { + case 0 ... (KVM_MAX_CPUCFG_REGS - 1): + vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index]; + break; + case CPUCFG_KVM_SIG: + /* + * Cpucfg emulation between 0x40000000 -- 0x400000ff + * Return value with 0 if executed in user mode + */ + if ((plv & CSR_CRMD_PLV) == PLV_KERN) + vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE; + else + vcpu->arch.gprs[rd] = 0; + break; + default: + vcpu->arch.gprs[rd] = 0; + break; + } + + preempt_enable(); + return EMULATE_DONE; +} + +static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu) +{ unsigned long curr_pc; larch_inst inst; enum emulation_result er = EMULATE_DONE; @@ -224,21 +264,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu) er = EMULATE_FAIL; switch (((inst.word >> 24) & 0xff)) { case 0x0: /* CPUCFG GSPR */ - if (inst.reg2_format.opcode == 0x1B) { - rd = inst.reg2_format.rd; - rj = inst.reg2_format.rj; - ++vcpu->stat.cpucfg_exits; - index = vcpu->arch.gprs[rj]; - er = EMULATE_DONE; - /* - * By LoongArch Reference Manual 2.2.10.5 - * return value is 0 for undefined cpucfg index - */ - if (index < KVM_MAX_CPUCFG_REGS) - vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index]; - else - vcpu->arch.gprs[rd] = 0; - } + if (inst.reg2_format.opcode == cpucfg_op) + er = kvm_emu_cpucfg(vcpu, inst); break; case 0x4: /* CSR{RD,WR,XCHG} GSPR */ er = kvm_handle_csr(vcpu, inst);
Instruction cpucfg can be used to get processor features. And there is trap exception when it is executed in VM mode, and also it is to provide cpu features to VM. On real hardware cpucfg area 0 - 20 is used. Here one specified area 0x40000000 -- 0x400000ff is used for KVM hypervisor to privide PV features, and the area can be extended for other hypervisors in future. This area will never be used for real HW, it is only used by software. Signed-off-by: Bibo Mao <maobibo@loongson.cn> --- arch/loongarch/include/asm/inst.h | 1 + arch/loongarch/include/asm/loongarch.h | 10 +++++ arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++------- 3 files changed, 54 insertions(+), 16 deletions(-)