Message ID | 20240829125656.19017-1-xry111@xry111.site (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v5] LoongArch: vDSO: Wire up getrandom() vDSO implementation | expand |
On Thu, 2024-08-29 at 20:56 +0800, Xi Ruoyao wrote: > diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c Oops stupid I. Please remove this hunk applying the patch. I'll remove this in v6 if needed. > index bee9f7a3108f..ab258878d551 100644 > --- a/arch/loongarch/kernel/asm-offsets.c > +++ b/arch/loongarch/kernel/asm-offsets.c > @@ -14,6 +14,7 @@ > #include <asm/ptrace.h> > #include <asm/processor.h> > #include <asm/ftrace.h> > +#include <asm/vdso/vdso.h> > > static void __used output_ptreg_defines(void) > {
On Thu, Aug 29, 2024 at 09:01:05PM +0800, Xi Ruoyao wrote: > On Thu, 2024-08-29 at 20:56 +0800, Xi Ruoyao wrote: > > diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c > > Oops stupid I. Please remove this hunk applying the patch. > > I'll remove this in v6 if needed. While we're on the topic of herps and derps, > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > > Cc: linux-crypto@vger.kernel.org > Cc: loongarch@lists.linux.dev > Cc: linux-kernel@vger.kernel.org > Cc: Jinyang He <hejinyang@loongson.cn> > Cc: Tiezhu Yang <yangtiezhu@loongson.cn> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Typically all of the Cc tags go before your Signed-off-by and enter the git history of the commit. "Here's who received this letter" sort of documentation. (But not a big deal and something I can fix up upon commit, of course.)
Hi Huacai & Ruoyao, On Thu, Aug 29, 2024 at 08:56:55PM +0800, Xi Ruoyao wrote: > Hook up the generic vDSO implementation to the LoongArch vDSO data page > by providing the required __arch_chacha20_blocks_nostack, > __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. > > Also enable the vDSO getrandom tests for LoongArch: create the symlink > to the arch/loongarch/vdso directory, and correctly set the ARCH > variable for LoongArch. > > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > > Passed vdso_test_getrandom and vdso_test_chacha tests. Benchmark > results: > > vdso_test_getrandom bench-single: > > vdso: 25000000 times in 0.499490289 seconds > libc: 25000000 times in 6.963829873 seconds > syscall: 25000000 times in 6.983413486 seconds > > vdso_test_getrandom bench-multi: > vdso: 25000000 x 256 times in 28.703710823 seconds > libc: 25000000 x 256 times in 356.835801784 seconds > syscall: 25000000 x 256 times in 338.525837197 seconds Thanks for including the test results. I've also verified the vdso_test_chacha and can confirm that passes: zx2c4@thinkpad ~/Projects/random-linux/tools/testing/selftests/vDSO $ /home/zx2c4/Downloads/Chrome/cross-tools/bin/loongarch64-unknown-linux-gnu-gcc -std=gnu99 -D_GNU_SOURCE= -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../tools/include -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../arch/loongarch/include -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../include -D__ASSEMBLY__ -Wa,--noexecstack vdso_test_chacha.c /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../tools/arch/loongarch/vdso/vgetrandom-chacha.S -o /home/zx2c4/Projects/random-linux/tools/testing/selftests/vDSO/vdso_test_chacha -static zx2c4@thinkpad ~/Projects/random-linux/tools/testing/selftests/vDSO $ qemu-loongarch64 /home/zx2c4/Projects/random-linux/tools/testing/selftests/vDSO/vdso_test_chacha TAP version 13 1..1 ok 1 chacha: PASS So, with the asm-offsets.c snippet removed as you instructed, this is fine for me. I'll happily queue this up for 6.12, but I'd like Huacai's Acked-by first. Huacai, does this look okay to you? Thanks, Jason
Hi, Ruoyao, On Thu, Aug 29, 2024 at 8:58 PM Xi Ruoyao <xry111@xry111.site> wrote: > > Hook up the generic vDSO implementation to the LoongArch vDSO data page > by providing the required __arch_chacha20_blocks_nostack, > __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. > > Also enable the vDSO getrandom tests for LoongArch: create the symlink > to the arch/loongarch/vdso directory, and correctly set the ARCH > variable for LoongArch. > > Signed-off-by: Xi Ruoyao <xry111@xry111.site> > --- > > Cc: linux-crypto@vger.kernel.org > Cc: loongarch@lists.linux.dev > Cc: linux-kernel@vger.kernel.org > Cc: Jinyang He <hejinyang@loongson.cn> > Cc: Tiezhu Yang <yangtiezhu@loongson.cn> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu> > > Passed vdso_test_getrandom and vdso_test_chacha tests. Benchmark > results: > > vdso_test_getrandom bench-single: > > vdso: 25000000 times in 0.499490289 seconds > libc: 25000000 times in 6.963829873 seconds > syscall: 25000000 times in 6.983413486 seconds > > vdso_test_getrandom bench-multi: > vdso: 25000000 x 256 times in 28.703710823 seconds > libc: 25000000 x 256 times in 356.835801784 seconds > syscall: 25000000 x 256 times in 338.525837197 seconds > > [v4]->v5: > - Rebase onto crng/random.git: > - Remove two selftest patches. > - Remove __arch_chacha20_blocks_nostack forward declaration. > - Squash the remaining selftest patch into the vDSO getrandom > implementation patch. > - Remove ifdef CONFIG_VDSO_GETRANDOM and $(CONFIG_VDSO_GETRANDOM) as > they are always true in arch/loongarch. > - Remove asm-offsets.c change which has been already unneeded in v4. > - Add comment about rematerializing the constant in the assembly code. > - Add prototype for __vdso_getrandom to silence a -Wmissing-prototype > warning. > > [v3]->v4: > - Remove LSX implementation, which isn't much faster than the generic > implementaion. > - Rebase onto crng/random.git: > - Define __arch_get_k_vdso_rng_data instead of using inline asm to > provide the _vdso_rng_data symbol in a magic way. > - Remove memset.S. > - Use c-getrandom-y to easily include the generic C code. > - The benchmark results seem better than v3, maybe related to the TLS > refactoring in random.git. > - Add patches for selftests. > > [v2]->v3: > - Add a generic LoongArch implementation for which LSX isn't needed. > > v1->v2: > - Properly send the series to the list. > > [v4]:https://lore.kernel.org/all/20240827132018.88854-1-xry111@xry111.site/ > [v3]:https://lore.kernel.org/all/20240816110717.10249-1-xry111@xry111.site/ > [v2]:https://lore.kernel.org/all/20240815133357.35829-1-xry111@xry111.site/ > > arch/loongarch/Kconfig | 1 + > arch/loongarch/include/asm/vdso/getrandom.h | 44 ++++ > arch/loongarch/include/asm/vdso/vdso.h | 6 + > arch/loongarch/include/asm/vdso/vsyscall.h | 8 + > arch/loongarch/kernel/asm-offsets.c | 1 + > arch/loongarch/kernel/vdso.c | 2 + > arch/loongarch/vdso/Makefile | 7 +- > arch/loongarch/vdso/vdso.lds.S | 1 + > arch/loongarch/vdso/vgetrandom-chacha.S | 242 ++++++++++++++++++++ > arch/loongarch/vdso/vgetrandom.c | 15 ++ > tools/arch/loongarch/vdso | 1 + > tools/testing/selftests/vDSO/Makefile | 4 +- > 12 files changed, 329 insertions(+), 3 deletions(-) > create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h > create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S > create mode 100644 arch/loongarch/vdso/vgetrandom.c > create mode 120000 tools/arch/loongarch/vdso > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 70f169210b52..14821c2aba5b 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -190,6 +190,7 @@ config LOONGARCH > select TRACE_IRQFLAGS_SUPPORT > select USE_PERCPU_NUMA_NODE_ID > select USER_STACKTRACE_SUPPORT > + select VDSO_GETRANDOM > select ZONE_DMA32 > > config 32BIT > diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h > new file mode 100644 > index 000000000000..04c991f6921d > --- /dev/null > +++ b/arch/loongarch/include/asm/vdso/getrandom.h > @@ -0,0 +1,44 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. > + */ > +#ifndef __ASM_VDSO_GETRANDOM_H > +#define __ASM_VDSO_GETRANDOM_H > + > +#ifndef __ASSEMBLY__ > + > +#include <asm/unistd.h> > +#include <asm/vdso/vdso.h> > + > +static __always_inline ssize_t getrandom_syscall(void *_buffer, > + size_t _len, > + unsigned int _flags) > +{ > + register long ret asm("a0"); > + register long nr asm("a7") = __NR_getrandom; > + register void *buffer asm("a0") = _buffer; > + register size_t len asm("a1") = _len; > + register unsigned int flags asm("a2") = _flags; > + > + asm volatile( > + " syscall 0\n" > + : "+r" (ret) > + : "r" (nr), "r" (buffer), "r" (len), "r" (flags) > + : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8", > + "memory"); > + > + return ret; > +} > + > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( > + void) As I said before, no line break needed, because there is no 80 characters limit now. > +{ > + return (const struct vdso_rng_data *)( > + get_vdso_data() + > + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + > + offsetof(struct loongarch_vdso_data, rng_data)); > +} The same, we don't need so many lines. > + > +#endif /* !__ASSEMBLY__ */ > + > +#endif /* __ASM_VDSO_GETRANDOM_H */ > diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h > index 5a12309d9fb5..e31ac7474513 100644 > --- a/arch/loongarch/include/asm/vdso/vdso.h > +++ b/arch/loongarch/include/asm/vdso/vdso.h > @@ -4,6 +4,9 @@ > * Copyright (C) 2020-2022 Loongson Technology Corporation Limited > */ > > +#ifndef _ASM_VDSO_VDSO_H > +#define _ASM_VDSO_VDSO_H > + > #ifndef __ASSEMBLY__ > > #include <asm/asm.h> > @@ -16,6 +19,7 @@ struct vdso_pcpu_data { > > struct loongarch_vdso_data { > struct vdso_pcpu_data pdata[NR_CPUS]; > + struct vdso_rng_data rng_data; > }; > > /* > @@ -63,3 +67,5 @@ static inline unsigned long get_vdso_data(void) > } > > #endif /* __ASSEMBLY__ */ > + > +#endif > diff --git a/arch/loongarch/include/asm/vdso/vsyscall.h b/arch/loongarch/include/asm/vdso/vsyscall.h > index 5de615383a22..b1273ce6f140 100644 > --- a/arch/loongarch/include/asm/vdso/vsyscall.h > +++ b/arch/loongarch/include/asm/vdso/vsyscall.h > @@ -8,6 +8,7 @@ > #include <vdso/datapage.h> > > extern struct vdso_data *vdso_data; > +extern struct vdso_rng_data *vdso_rng_data; > > /* > * Update the vDSO data page to keep in sync with kernel timekeeping. > @@ -19,6 +20,13 @@ struct vdso_data *__loongarch_get_k_vdso_data(void) > } > #define __arch_get_k_vdso_data __loongarch_get_k_vdso_data > > +static __always_inline > +struct vdso_rng_data *__loongarch_get_k_vdso_rng_data(void) > +{ > + return vdso_rng_data; > +} > +#define __arch_get_k_vdso_rng_data __loongarch_get_k_vdso_rng_data > + > /* The asm-generic header needs to be included after the definitions above */ > #include <asm-generic/vdso/vsyscall.h> > > diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c > index bee9f7a3108f..ab258878d551 100644 > --- a/arch/loongarch/kernel/asm-offsets.c > +++ b/arch/loongarch/kernel/asm-offsets.c > @@ -14,6 +14,7 @@ > #include <asm/ptrace.h> > #include <asm/processor.h> > #include <asm/ftrace.h> > +#include <asm/vdso/vdso.h> > > static void __used output_ptreg_defines(void) > { > diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c > index 90dfccb41c14..2af05ba5f121 100644 > --- a/arch/loongarch/kernel/vdso.c > +++ b/arch/loongarch/kernel/vdso.c > @@ -22,6 +22,7 @@ > #include <vdso/helpers.h> > #include <vdso/vsyscall.h> > #include <vdso/datapage.h> > +#include <generated/asm-offsets.h> > #include <generated/vdso-offsets.h> > > extern char vdso_start[], vdso_end[]; > @@ -37,6 +38,7 @@ static union { > static struct page *vdso_pages[] = { NULL }; > struct vdso_data *vdso_data = generic_vdso_data.data; > struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata; > +struct vdso_rng_data *vdso_rng_data = &loongarch_vdso_data.vdata.rng_data; > > static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma) > { > diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile > index d724d46b07c8..19249d4b3542 100644 > --- a/arch/loongarch/vdso/Makefile > +++ b/arch/loongarch/vdso/Makefile > @@ -4,7 +4,8 @@ > # Include the generic Makefile to check the built vdso. > include $(srctree)/lib/vdso/Makefile > > -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o > +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \ > + vgetrandom-chacha.o > > # Common compiler flags between ABIs. > ccflags-vdso := \ > @@ -29,6 +30,10 @@ ifneq ($(c-gettimeofday-y),) > CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y) > endif > > +ifneq ($(c-getrandom-y),) > + CFLAGS_vgetrandom.o += -include $(c-getrandom-y) > +endif > + > # VDSO linker flags. > ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \ > $(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared \ > diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S > index 56ad855896de..2c965a597d9e 100644 > --- a/arch/loongarch/vdso/vdso.lds.S > +++ b/arch/loongarch/vdso/vdso.lds.S > @@ -63,6 +63,7 @@ VERSION > __vdso_clock_gettime; > __vdso_gettimeofday; > __vdso_rt_sigreturn; > + __vdso_getrandom; I also said before that keep __vdso_rt_sigreturn be the last one. > local: *; > }; > } > diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S > new file mode 100644 > index 000000000000..7e86a50f6e85 > --- /dev/null > +++ b/arch/loongarch/vdso/vgetrandom-chacha.S > @@ -0,0 +1,242 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. > + */ > + > +#include <asm/asm.h> > +#include <asm/regdef.h> > +#include <linux/linkage.h> > + > +.text > + > +/* Salsa20 quarter-round */ > +.macro QR a b c d > + add.w \a, \a, \b > + xor \d, \d, \a > + rotri.w \d, \d, 16 > + > + add.w \c, \c, \d > + xor \b, \b, \c > + rotri.w \b, \b, 20 > + > + add.w \a, \a, \b > + xor \d, \d, \a > + rotri.w \d, \d, 24 > + > + add.w \c, \c, \d > + xor \b, \b, \c > + rotri.w \b, \b, 25 > +.endm > + > +/* > + * Very basic LoongArch implementation of ChaCha20. Produces a given positive > + * number of blocks of output with a nonce of 0, taking an input key and > + * 8-byte counter. Importantly does not spill to the stack. Its arguments > + * are: > + * > + * a0: output bytes > + * a1: 32-byte key input > + * a2: 8-byte counter input/output > + * a3: number of 64-byte blocks to write to output > + */ > +SYM_FUNC_START(__arch_chacha20_blocks_nostack) > + > +/* We don't need a frame pointer */ > +#define s9 fp > + > +#define output a0 > +#define key a1 > +#define counter a2 > +#define nblocks a3 > +#define i a4 > +#define state0 s0 > +#define state1 s1 > +#define state2 s2 > +#define state3 s3 > +#define state4 s4 > +#define state5 s5 > +#define state6 s6 > +#define state7 s7 > +#define state8 s8 > +#define state9 s9 > +#define state10 a5 > +#define state11 a6 > +#define state12 a7 > +#define state13 t0 > +#define state14 t1 > +#define state15 t2 > +#define cnt_lo t3 > +#define cnt_hi t4 > +#define copy0 t5 > +#define copy1 t6 > +#define copy2 t7 > + > +/* Reuse i as copy3 */ > +#define copy3 i > + > + /* > + * The ABI requires s0-s9 saved, and sp aligned to 16-byte. > + * This does not violate the stack-less requirement: no sensitive data > + * is spilled onto the stack. > + */ > + PTR_ADDI sp, sp, (-SZREG * 10) & STACK_ALIGN > + REG_S s0, sp, 0 > + REG_S s1, sp, SZREG > + REG_S s2, sp, SZREG * 2 > + REG_S s3, sp, SZREG * 3 > + REG_S s4, sp, SZREG * 4 > + REG_S s5, sp, SZREG * 5 > + REG_S s6, sp, SZREG * 6 > + REG_S s7, sp, SZREG * 7 > + REG_S s8, sp, SZREG * 8 > + REG_S s9, sp, SZREG * 9 > + > + li.w copy0, 0x61707865 > + li.w copy1, 0x3320646e > + li.w copy2, 0x79622d32 > + > + ld.w cnt_lo, counter, 0 > + ld.w cnt_hi, counter, 4 > + > +.Lblock: > + /* state[0,1,2,3] = "expand 32-byte k" */ > + move state0, copy0 > + move state1, copy1 > + move state2, copy2 > + li.w state3, 0x6b206574 > + > + /* state[4,5,..,11] = key */ > + ld.w state4, key, 0 > + ld.w state5, key, 4 > + ld.w state6, key, 8 > + ld.w state7, key, 12 > + ld.w state8, key, 16 > + ld.w state9, key, 20 > + ld.w state10, key, 24 > + ld.w state11, key, 28 > + > + /* state[12,13] = counter */ > + move state12, cnt_lo > + move state13, cnt_hi > + > + /* state[14,15] = 0 */ > + move state14, zero > + move state15, zero > + > + li.w i, 10 > +.Lpermute: > + /* odd round */ > + QR state0, state4, state8, state12 > + QR state1, state5, state9, state13 > + QR state2, state6, state10, state14 > + QR state3, state7, state11, state15 > + > + /* even round */ > + QR state0, state5, state10, state15 > + QR state1, state6, state11, state12 > + QR state2, state7, state8, state13 > + QR state3, state4, state9, state14 > + > + addi.w i, i, -1 > + bnez i, .Lpermute > + > + /* > + * copy[3] = "expa", materialize it here because copy[3] shares the > + * same register with i which just became dead. > + */ > + li.w copy3, 0x6b206574 > + > + /* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */ > + add.w state0, state0, copy0 > + add.w state1, state1, copy1 > + add.w state2, state2, copy2 > + add.w state3, state3, copy3 > + st.w state0, output, 0 > + st.w state1, output, 4 > + st.w state2, output, 8 > + st.w state3, output, 12 > + > + /* from now on state[0,1,2,3] are scratch registers */ > + > + /* state[0,1,2,3] = lo32(key) */ > + ld.w state0, key, 0 > + ld.w state1, key, 4 > + ld.w state2, key, 8 > + ld.w state3, key, 12 > + > + /* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */ > + add.w state4, state4, state0 > + add.w state5, state5, state1 > + add.w state6, state6, state2 > + add.w state7, state7, state3 > + st.w state4, output, 16 > + st.w state5, output, 20 > + st.w state6, output, 24 > + st.w state7, output, 28 > + > + /* state[0,1,2,3] = hi32(key) */ > + ld.w state0, key, 16 > + ld.w state1, key, 20 > + ld.w state2, key, 24 > + ld.w state3, key, 28 > + > + /* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */ > + add.w state8, state8, state0 > + add.w state9, state9, state1 > + add.w state10, state10, state2 > + add.w state11, state11, state3 > + st.w state8, output, 32 > + st.w state9, output, 36 > + st.w state10, output, 40 > + st.w state11, output, 44 > + > + /* output[12,13,14,15] = state[12,13,14,15] + [cnt_lo, cnt_hi, 0, 0] */ > + add.w state12, state12, cnt_lo > + add.w state13, state13, cnt_hi > + st.w state12, output, 48 > + st.w state13, output, 52 > + st.w state14, output, 56 > + st.w state15, output, 60 > + > + /* ++counter */ > + addi.w cnt_lo, cnt_lo, 1 > + sltui state0, cnt_lo, 1 > + add.w cnt_hi, cnt_hi, state0 > + > + /* output += 64 */ > + PTR_ADDI output, output, 64 > + /* --nblocks */ > + PTR_ADDI nblocks, nblocks, -1 > + bnez nblocks, .Lblock > + > + /* counter = [cnt_lo, cnt_hi] */ > + st.w cnt_lo, counter, 0 > + st.w cnt_hi, counter, 4 > + > + /* > + * Zero out the potentially sensitive regs, in case nothing uses these > + * again. As at now copy[0,1,2,3] just contains "expand 32-byte k" and > + * state[0,...,9] are s0-s9 those we'll restore in the epilogue, so we > + * only need to zero state[11,...,15]. > + */ > + move state10, zero > + move state11, zero > + move state12, zero > + move state13, zero > + move state14, zero > + move state15, zero > + > + REG_L s0, sp, 0 > + REG_L s1, sp, SZREG > + REG_L s2, sp, SZREG * 2 > + REG_L s3, sp, SZREG * 3 > + REG_L s4, sp, SZREG * 4 > + REG_L s5, sp, SZREG * 5 > + REG_L s6, sp, SZREG * 6 > + REG_L s7, sp, SZREG * 7 > + REG_L s8, sp, SZREG * 8 > + REG_L s9, sp, SZREG * 9 > + PTR_ADDI sp, sp, -((-SZREG * 10) & STACK_ALIGN) > + > + jr ra > +SYM_FUNC_END(__arch_chacha20_blocks_nostack) > diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c > new file mode 100644 > index 000000000000..68e44b3f1d49 > --- /dev/null > +++ b/arch/loongarch/vdso/vgetrandom.c > @@ -0,0 +1,15 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. > + */ > +#include <linux/types.h> > + > +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, > + void *opaque_state, size_t opaque_len); Please see commit 42874e4eb35bdfc54f8514685e50434098ba4f6, it is better to avoid such hacks. > + > +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, > + void *opaque_state, size_t opaque_len) > +{ > + return __cvdso_getrandom(buffer, len, flags, opaque_state, > + opaque_len); Again, no line breaks needed. Huacai > +} > diff --git a/tools/arch/loongarch/vdso b/tools/arch/loongarch/vdso > new file mode 120000 > index 000000000000..ebda43a82db7 > --- /dev/null > +++ b/tools/arch/loongarch/vdso > @@ -0,0 +1 @@ > +../../../arch/loongarch/vdso > \ No newline at end of file > diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile > index e21e78aae24d..606ce5f5c2a4 100644 > --- a/tools/testing/selftests/vDSO/Makefile > +++ b/tools/testing/selftests/vDSO/Makefile > @@ -1,6 +1,6 @@ > # SPDX-License-Identifier: GPL-2.0 > uname_M := $(shell uname -m 2>/dev/null || echo not) > -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) > +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e /loongarch/s/[0-9]//g) > > TEST_GEN_PROGS := vdso_test_gettimeofday > TEST_GEN_PROGS += vdso_test_getcpu > @@ -10,7 +10,7 @@ ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64)) > TEST_GEN_PROGS += vdso_standalone_test_x86 > endif > TEST_GEN_PROGS += vdso_test_correctness > -ifeq ($(uname_M),x86_64) > +ifeq ($(uname_M),$(filter $(uname_M),x86_64 loongarch64)) > TEST_GEN_PROGS += vdso_test_getrandom > TEST_GEN_PROGS += vdso_test_chacha > endif > > base-commit: ec309dd126280ca4ef4087e02246a6f61f36c8d1 > -- > 2.46.0 >
One small question just occurred to me: > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( > + void) > +{ > + return (const struct vdso_rng_data *)( > + get_vdso_data() + > + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + > + offsetof(struct loongarch_vdso_data, rng_data)); > +} Did you test this in a TIMENS? On x86, I had to deal with the page offsets switching around depending on whether there was a TIMENS. I tested this in my test harness with some basic code like: if (argc == 1) { if (unshare(CLONE_NEWTIME)) panic("unshare(CLONE_NEWTIME)"); if (!fork()) { if (execl(argv[0], argv[0], "now-in-timens")) panic("execl"); } wait(NULL); poweroff(); } Because unlike other namespaces, the time one only becomes active after fork/exec. But maybe loongarch is more organized and you don't need any special handling in __arch_get_vdso...data() functions like I needed on x86. Just thought I should check. Jason
On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote: > One small question just occurred to me: > > > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( > > + void) > > +{ > > + return (const struct vdso_rng_data *)( > > + get_vdso_data() + > > + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + > > + offsetof(struct loongarch_vdso_data, rng_data)); > > +} > > Did you test this in a TIMENS? On x86, I had to deal with the page > offsets switching around depending on whether there was a TIMENS. I > tested this in my test harness with some basic code like: > > if (argc == 1) { > if (unshare(CLONE_NEWTIME)) > panic("unshare(CLONE_NEWTIME)"); > if (!fork()) { > if (execl(argv[0], argv[0], "now-in-timens")) > panic("execl"); > } > wait(NULL); > poweroff(); > } > > Because unlike other namespaces, the time one only becomes active after > fork/exec. > > But maybe loongarch is more organized and you don't need any special > handling in __arch_get_vdso...data() functions like I needed on x86. > Just thought I should check. Normal results: vdso: 25000000 times in 0.287330836 seconds libc: 25000000 times in 4.480710835 seconds syscall: 25000000 times in 4.411098048 seconds After applying diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h index ff5334ad32a0..5cb1b318ebe3 100644 --- a/arch/x86/include/asm/vdso/getrandom.h +++ b/arch/x86/include/asm/vdso/getrandom.h @@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) { - if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) - return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data); return &__vdso_rng_data; } the results are: vdso: 25000000 times in 4.403789593 seconds libc: 25000000 times in 4.466771093 seconds syscall: 25000000 times in 4.428145416 seconds The difference is that when it finds the shared data in the wrong place, it thinks the RNG is uninitialized, so it always falls back to the syscall, hence all three times being the same. If you're unsure how timens handling works on loongarch, try this test yourself and see what you get. Jason
On Thu, 2024-08-29 at 15:37 +0200, Jason A. Donenfeld wrote: > On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote: > > One small question just occurred to me: > > > > > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( > > > + void) > > > +{ > > > + return (const struct vdso_rng_data *)( > > > + get_vdso_data() + > > > + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + > > > + offsetof(struct loongarch_vdso_data, rng_data)); > > > +} > > > > Did you test this in a TIMENS? On x86, I had to deal with the page > > offsets switching around depending on whether there was a TIMENS. I > > tested this in my test harness with some basic code like: > > > > if (argc == 1) { > > if (unshare(CLONE_NEWTIME)) > > panic("unshare(CLONE_NEWTIME)"); > > if (!fork()) { > > if (execl(argv[0], argv[0], "now-in-timens")) > > panic("execl"); > > } > > wait(NULL); > > poweroff(); > > } > > > > Because unlike other namespaces, the time one only becomes active after > > fork/exec. > > > > But maybe loongarch is more organized and you don't need any special > > handling in __arch_get_vdso...data() functions like I needed on x86. > > Just thought I should check. > > Normal results: > > vdso: 25000000 times in 0.287330836 seconds > libc: 25000000 times in 4.480710835 seconds > syscall: 25000000 times in 4.411098048 seconds > > After applying > > diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h > index ff5334ad32a0..5cb1b318ebe3 100644 > --- a/arch/x86/include/asm/vdso/getrandom.h > +++ b/arch/x86/include/asm/vdso/getrandom.h > @@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig > > static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) > { > - if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) > - return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data); > return &__vdso_rng_data; > } > > the results are: > > vdso: 25000000 times in 4.403789593 seconds > libc: 25000000 times in 4.466771093 seconds > syscall: 25000000 times in 4.428145416 seconds > > The difference is that when it finds the shared data in the wrong place, > it thinks the RNG is uninitialized, so it always falls back to the > syscall, hence all three times being the same. > > If you're unsure how timens handling works on loongarch, try this test > yourself and see what you get. $ unshare -r -T --boottime $((365*24*3600)) # uptime 21:54:36 up 365 days, 5:38, 0 user, load average: 0.05, 0.08, 2.82 # /home/xry111/git-repos/linux/tools/testing/selftests/vDSO/vdso_test_getrandom bench-single vdso: 25000000 times in 0.499528591 seconds libc: 25000000 times in 6.968980040 seconds syscall: 25000000 times in 6.987357071 seconds So it seems normal in a time ns. And from a comment in arch/loongarch/include/asm/vdso/vdso.h: /* * The layout of vvar: * * high * +---------------------+--------------------------+ * | loongarch vdso data | LOONGARCH_VDSO_DATA_SIZE | * +---------------------+--------------------------+ * | time-ns vdso data | PAGE_SIZE | * +---------------------+--------------------------+ * | generic vdso data | PAGE_SIZE | * +---------------------+--------------------------+ * low */ And VVAR_LOONGARCH_PAGES_START is 2: enum vvar_pages { VVAR_GENERIC_PAGE_OFFSET, VVAR_TIMENS_PAGE_OFFSET, VVAR_LOONGARCH_PAGES_START, VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START + LOONGARCH_VDSO_DATA_PAGES - 1, VVAR_NR_PAGES, }; So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have already "jumped over" the time-ns vdso data. OTOH it seems we are wasting a page if !CONFIG_TIME_NS. Maybe: enum vvar_pages { VVAR_GENERIC_PAGE_OFFSET, #ifdef CONFIG_TIME_NS VVAR_TIMENS_PAGE_OFFSET, #endif VVAR_LOONGARCH_PAGES_START, VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START + LOONGARCH_VDSO_DATA_PAGES - 1, VVAR_NR_PAGES, }; Tiezhu: how do you think?
On Thu, Aug 29, 2024 at 4:06 PM Xi Ruoyao <xry111@xry111.site> wrote: > > On Thu, 2024-08-29 at 15:37 +0200, Jason A. Donenfeld wrote: > > On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote: > > > One small question just occurred to me: > > > > > > > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( > > > > + void) > > > > +{ > > > > + return (const struct vdso_rng_data *)( > > > > + get_vdso_data() + > > > > + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + > > > > + offsetof(struct loongarch_vdso_data, rng_data)); > > > > +} > > > > > > Did you test this in a TIMENS? On x86, I had to deal with the page > > > offsets switching around depending on whether there was a TIMENS. I > > > tested this in my test harness with some basic code like: > > > > > > if (argc == 1) { > > > if (unshare(CLONE_NEWTIME)) > > > panic("unshare(CLONE_NEWTIME)"); > > > if (!fork()) { > > > if (execl(argv[0], argv[0], "now-in-timens")) > > > panic("execl"); > > > } > > > wait(NULL); > > > poweroff(); > > > } > > > > > > Because unlike other namespaces, the time one only becomes active after > > > fork/exec. > > > > > > But maybe loongarch is more organized and you don't need any special > > > handling in __arch_get_vdso...data() functions like I needed on x86. > > > Just thought I should check. > > > > Normal results: > > > > vdso: 25000000 times in 0.287330836 seconds > > libc: 25000000 times in 4.480710835 seconds > > syscall: 25000000 times in 4.411098048 seconds > > > > After applying > > > > diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h > > index ff5334ad32a0..5cb1b318ebe3 100644 > > --- a/arch/x86/include/asm/vdso/getrandom.h > > +++ b/arch/x86/include/asm/vdso/getrandom.h > > @@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig > > > > static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) > > { > > - if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) > > - return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data); > > return &__vdso_rng_data; > > } > > > > the results are: > > > > vdso: 25000000 times in 4.403789593 seconds > > libc: 25000000 times in 4.466771093 seconds > > syscall: 25000000 times in 4.428145416 seconds > > > > The difference is that when it finds the shared data in the wrong place, > > it thinks the RNG is uninitialized, so it always falls back to the > > syscall, hence all three times being the same. > > > > If you're unsure how timens handling works on loongarch, try this test > > yourself and see what you get. > > $ unshare -r -T --boottime $((365*24*3600)) > # uptime > 21:54:36 up 365 days, 5:38, 0 user, load average: 0.05, 0.08, 2.82 > # /home/xry111/git-repos/linux/tools/testing/selftests/vDSO/vdso_test_getrandom bench-single > vdso: 25000000 times in 0.499528591 seconds > libc: 25000000 times in 6.968980040 seconds > syscall: 25000000 times in 6.987357071 seconds > > So it seems normal in a time ns. > > And from a comment in arch/loongarch/include/asm/vdso/vdso.h: > > /* > * The layout of vvar: > * > * high > * +---------------------+--------------------------+ > * | loongarch vdso data | LOONGARCH_VDSO_DATA_SIZE | > * +---------------------+--------------------------+ > * | time-ns vdso data | PAGE_SIZE | > * +---------------------+--------------------------+ > * | generic vdso data | PAGE_SIZE | > * +---------------------+--------------------------+ > * low > */ > > And VVAR_LOONGARCH_PAGES_START is 2: > > enum vvar_pages { > VVAR_GENERIC_PAGE_OFFSET, > VVAR_TIMENS_PAGE_OFFSET, > VVAR_LOONGARCH_PAGES_START, > VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START + > LOONGARCH_VDSO_DATA_PAGES - 1, > VVAR_NR_PAGES, > }; > > So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have > already "jumped over" the time-ns vdso data. Oh good. Thanks for checking. So it sounds like there's just Huacai's set of comments and we're good.
On Thu, 2024-08-29 at 16:44 +0200, Jason A. Donenfeld wrote: > > So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have > > already "jumped over" the time-ns vdso data. > > Oh good. Thanks for checking. So it sounds like there's just Huacai's > set of comments and we're good. Both Huacai and I (we've discussed a little off the list) think it seems more natural to separate the implementation and the self test into two patches. Do you think it's acceptable? If not we can live with one consolidated patch though.
On Fri, Aug 30, 2024 at 05:05:38PM +0800, Xi Ruoyao wrote: > On Thu, 2024-08-29 at 16:44 +0200, Jason A. Donenfeld wrote: > > > So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have > > > already "jumped over" the time-ns vdso data. > > > > Oh good. Thanks for checking. So it sounds like there's just Huacai's > > set of comments and we're good. > > Both Huacai and I (we've discussed a little off the list) think it seems > more natural to separate the implementation and the self test into two > patches. Do you think it's acceptable? If not we can live with one > consolidated patch though. I'm not such a fan, as wiring up the test makes the whole thing more "hermetic", but it also doesn't matter to me that much, so if you two prefer that, that's fine with me and I'll commit it like that.
On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote: > > -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o > > +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \ > > + vgetrandom-chacha.o Huacai: do you prefer to remove this line break as well, or Makefile still has a line width limit?
On Sat, Aug 31, 2024 at 2:40 PM Xi Ruoyao <xry111@xry111.site> wrote: > > On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote: > > > -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o > > > +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \ > > > + vgetrandom-chacha.o > > Huacai: do you prefer to remove this line break as well, or Makefile > still has a line width limit? Also no limit, but Makefile is more or less different because there is no "statement" and doesn't affect our understanding. So, line break is fine here (but for my own preference I like to keep sigreturn.o at last). Huacai > > -- > Xi Ruoyao <xry111@xry111.site> > School of Aerospace Science and Technology, Xidian University
Le 31/08/2024 à 09:10, Huacai Chen a écrit : > [Vous ne recevez pas souvent de courriers de chenhuacai@kernel.org. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] > > On Sat, Aug 31, 2024 at 2:40 PM Xi Ruoyao <xry111@xry111.site> wrote: >> >> On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote: >>>> -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o >>>> +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \ >>>> + vgetrandom-chacha.o >> >> Huacai: do you prefer to remove this line break as well, or Makefile >> still has a line width limit? > Also no limit, but Makefile is more or less different because there is > no "statement" and doesn't affect our understanding. > > So, line break is fine here (but for my own preference I like to keep > sigreturn.o at last). To avoid line break you can leave the first line unmodified and add: obj-vdso-y += vgetrandom.o vgetrandom-chacha.o Christophe
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 70f169210b52..14821c2aba5b 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -190,6 +190,7 @@ config LOONGARCH select TRACE_IRQFLAGS_SUPPORT select USE_PERCPU_NUMA_NODE_ID select USER_STACKTRACE_SUPPORT + select VDSO_GETRANDOM select ZONE_DMA32 config 32BIT diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h new file mode 100644 index 000000000000..04c991f6921d --- /dev/null +++ b/arch/loongarch/include/asm/vdso/getrandom.h @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. + */ +#ifndef __ASM_VDSO_GETRANDOM_H +#define __ASM_VDSO_GETRANDOM_H + +#ifndef __ASSEMBLY__ + +#include <asm/unistd.h> +#include <asm/vdso/vdso.h> + +static __always_inline ssize_t getrandom_syscall(void *_buffer, + size_t _len, + unsigned int _flags) +{ + register long ret asm("a0"); + register long nr asm("a7") = __NR_getrandom; + register void *buffer asm("a0") = _buffer; + register size_t len asm("a1") = _len; + register unsigned int flags asm("a2") = _flags; + + asm volatile( + " syscall 0\n" + : "+r" (ret) + : "r" (nr), "r" (buffer), "r" (len), "r" (flags) + : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8", + "memory"); + + return ret; +} + +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data( + void) +{ + return (const struct vdso_rng_data *)( + get_vdso_data() + + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE + + offsetof(struct loongarch_vdso_data, rng_data)); +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h index 5a12309d9fb5..e31ac7474513 100644 --- a/arch/loongarch/include/asm/vdso/vdso.h +++ b/arch/loongarch/include/asm/vdso/vdso.h @@ -4,6 +4,9 @@ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited */ +#ifndef _ASM_VDSO_VDSO_H +#define _ASM_VDSO_VDSO_H + #ifndef __ASSEMBLY__ #include <asm/asm.h> @@ -16,6 +19,7 @@ struct vdso_pcpu_data { struct loongarch_vdso_data { struct vdso_pcpu_data pdata[NR_CPUS]; + struct vdso_rng_data rng_data; }; /* @@ -63,3 +67,5 @@ static inline unsigned long get_vdso_data(void) } #endif /* __ASSEMBLY__ */ + +#endif diff --git a/arch/loongarch/include/asm/vdso/vsyscall.h b/arch/loongarch/include/asm/vdso/vsyscall.h index 5de615383a22..b1273ce6f140 100644 --- a/arch/loongarch/include/asm/vdso/vsyscall.h +++ b/arch/loongarch/include/asm/vdso/vsyscall.h @@ -8,6 +8,7 @@ #include <vdso/datapage.h> extern struct vdso_data *vdso_data; +extern struct vdso_rng_data *vdso_rng_data; /* * Update the vDSO data page to keep in sync with kernel timekeeping. @@ -19,6 +20,13 @@ struct vdso_data *__loongarch_get_k_vdso_data(void) } #define __arch_get_k_vdso_data __loongarch_get_k_vdso_data +static __always_inline +struct vdso_rng_data *__loongarch_get_k_vdso_rng_data(void) +{ + return vdso_rng_data; +} +#define __arch_get_k_vdso_rng_data __loongarch_get_k_vdso_rng_data + /* The asm-generic header needs to be included after the definitions above */ #include <asm-generic/vdso/vsyscall.h> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c index bee9f7a3108f..ab258878d551 100644 --- a/arch/loongarch/kernel/asm-offsets.c +++ b/arch/loongarch/kernel/asm-offsets.c @@ -14,6 +14,7 @@ #include <asm/ptrace.h> #include <asm/processor.h> #include <asm/ftrace.h> +#include <asm/vdso/vdso.h> static void __used output_ptreg_defines(void) { diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index 90dfccb41c14..2af05ba5f121 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -22,6 +22,7 @@ #include <vdso/helpers.h> #include <vdso/vsyscall.h> #include <vdso/datapage.h> +#include <generated/asm-offsets.h> #include <generated/vdso-offsets.h> extern char vdso_start[], vdso_end[]; @@ -37,6 +38,7 @@ static union { static struct page *vdso_pages[] = { NULL }; struct vdso_data *vdso_data = generic_vdso_data.data; struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata; +struct vdso_rng_data *vdso_rng_data = &loongarch_vdso_data.vdata.rng_data; static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma) { diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile index d724d46b07c8..19249d4b3542 100644 --- a/arch/loongarch/vdso/Makefile +++ b/arch/loongarch/vdso/Makefile @@ -4,7 +4,8 @@ # Include the generic Makefile to check the built vdso. include $(srctree)/lib/vdso/Makefile -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \ + vgetrandom-chacha.o # Common compiler flags between ABIs. ccflags-vdso := \ @@ -29,6 +30,10 @@ ifneq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y) endif +ifneq ($(c-getrandom-y),) + CFLAGS_vgetrandom.o += -include $(c-getrandom-y) +endif + # VDSO linker flags. ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \ $(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared \ diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S index 56ad855896de..2c965a597d9e 100644 --- a/arch/loongarch/vdso/vdso.lds.S +++ b/arch/loongarch/vdso/vdso.lds.S @@ -63,6 +63,7 @@ VERSION __vdso_clock_gettime; __vdso_gettimeofday; __vdso_rt_sigreturn; + __vdso_getrandom; local: *; }; } diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S new file mode 100644 index 000000000000..7e86a50f6e85 --- /dev/null +++ b/arch/loongarch/vdso/vgetrandom-chacha.S @@ -0,0 +1,242 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. + */ + +#include <asm/asm.h> +#include <asm/regdef.h> +#include <linux/linkage.h> + +.text + +/* Salsa20 quarter-round */ +.macro QR a b c d + add.w \a, \a, \b + xor \d, \d, \a + rotri.w \d, \d, 16 + + add.w \c, \c, \d + xor \b, \b, \c + rotri.w \b, \b, 20 + + add.w \a, \a, \b + xor \d, \d, \a + rotri.w \d, \d, 24 + + add.w \c, \c, \d + xor \b, \b, \c + rotri.w \b, \b, 25 +.endm + +/* + * Very basic LoongArch implementation of ChaCha20. Produces a given positive + * number of blocks of output with a nonce of 0, taking an input key and + * 8-byte counter. Importantly does not spill to the stack. Its arguments + * are: + * + * a0: output bytes + * a1: 32-byte key input + * a2: 8-byte counter input/output + * a3: number of 64-byte blocks to write to output + */ +SYM_FUNC_START(__arch_chacha20_blocks_nostack) + +/* We don't need a frame pointer */ +#define s9 fp + +#define output a0 +#define key a1 +#define counter a2 +#define nblocks a3 +#define i a4 +#define state0 s0 +#define state1 s1 +#define state2 s2 +#define state3 s3 +#define state4 s4 +#define state5 s5 +#define state6 s6 +#define state7 s7 +#define state8 s8 +#define state9 s9 +#define state10 a5 +#define state11 a6 +#define state12 a7 +#define state13 t0 +#define state14 t1 +#define state15 t2 +#define cnt_lo t3 +#define cnt_hi t4 +#define copy0 t5 +#define copy1 t6 +#define copy2 t7 + +/* Reuse i as copy3 */ +#define copy3 i + + /* + * The ABI requires s0-s9 saved, and sp aligned to 16-byte. + * This does not violate the stack-less requirement: no sensitive data + * is spilled onto the stack. + */ + PTR_ADDI sp, sp, (-SZREG * 10) & STACK_ALIGN + REG_S s0, sp, 0 + REG_S s1, sp, SZREG + REG_S s2, sp, SZREG * 2 + REG_S s3, sp, SZREG * 3 + REG_S s4, sp, SZREG * 4 + REG_S s5, sp, SZREG * 5 + REG_S s6, sp, SZREG * 6 + REG_S s7, sp, SZREG * 7 + REG_S s8, sp, SZREG * 8 + REG_S s9, sp, SZREG * 9 + + li.w copy0, 0x61707865 + li.w copy1, 0x3320646e + li.w copy2, 0x79622d32 + + ld.w cnt_lo, counter, 0 + ld.w cnt_hi, counter, 4 + +.Lblock: + /* state[0,1,2,3] = "expand 32-byte k" */ + move state0, copy0 + move state1, copy1 + move state2, copy2 + li.w state3, 0x6b206574 + + /* state[4,5,..,11] = key */ + ld.w state4, key, 0 + ld.w state5, key, 4 + ld.w state6, key, 8 + ld.w state7, key, 12 + ld.w state8, key, 16 + ld.w state9, key, 20 + ld.w state10, key, 24 + ld.w state11, key, 28 + + /* state[12,13] = counter */ + move state12, cnt_lo + move state13, cnt_hi + + /* state[14,15] = 0 */ + move state14, zero + move state15, zero + + li.w i, 10 +.Lpermute: + /* odd round */ + QR state0, state4, state8, state12 + QR state1, state5, state9, state13 + QR state2, state6, state10, state14 + QR state3, state7, state11, state15 + + /* even round */ + QR state0, state5, state10, state15 + QR state1, state6, state11, state12 + QR state2, state7, state8, state13 + QR state3, state4, state9, state14 + + addi.w i, i, -1 + bnez i, .Lpermute + + /* + * copy[3] = "expa", materialize it here because copy[3] shares the + * same register with i which just became dead. + */ + li.w copy3, 0x6b206574 + + /* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */ + add.w state0, state0, copy0 + add.w state1, state1, copy1 + add.w state2, state2, copy2 + add.w state3, state3, copy3 + st.w state0, output, 0 + st.w state1, output, 4 + st.w state2, output, 8 + st.w state3, output, 12 + + /* from now on state[0,1,2,3] are scratch registers */ + + /* state[0,1,2,3] = lo32(key) */ + ld.w state0, key, 0 + ld.w state1, key, 4 + ld.w state2, key, 8 + ld.w state3, key, 12 + + /* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */ + add.w state4, state4, state0 + add.w state5, state5, state1 + add.w state6, state6, state2 + add.w state7, state7, state3 + st.w state4, output, 16 + st.w state5, output, 20 + st.w state6, output, 24 + st.w state7, output, 28 + + /* state[0,1,2,3] = hi32(key) */ + ld.w state0, key, 16 + ld.w state1, key, 20 + ld.w state2, key, 24 + ld.w state3, key, 28 + + /* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */ + add.w state8, state8, state0 + add.w state9, state9, state1 + add.w state10, state10, state2 + add.w state11, state11, state3 + st.w state8, output, 32 + st.w state9, output, 36 + st.w state10, output, 40 + st.w state11, output, 44 + + /* output[12,13,14,15] = state[12,13,14,15] + [cnt_lo, cnt_hi, 0, 0] */ + add.w state12, state12, cnt_lo + add.w state13, state13, cnt_hi + st.w state12, output, 48 + st.w state13, output, 52 + st.w state14, output, 56 + st.w state15, output, 60 + + /* ++counter */ + addi.w cnt_lo, cnt_lo, 1 + sltui state0, cnt_lo, 1 + add.w cnt_hi, cnt_hi, state0 + + /* output += 64 */ + PTR_ADDI output, output, 64 + /* --nblocks */ + PTR_ADDI nblocks, nblocks, -1 + bnez nblocks, .Lblock + + /* counter = [cnt_lo, cnt_hi] */ + st.w cnt_lo, counter, 0 + st.w cnt_hi, counter, 4 + + /* + * Zero out the potentially sensitive regs, in case nothing uses these + * again. As at now copy[0,1,2,3] just contains "expand 32-byte k" and + * state[0,...,9] are s0-s9 those we'll restore in the epilogue, so we + * only need to zero state[11,...,15]. + */ + move state10, zero + move state11, zero + move state12, zero + move state13, zero + move state14, zero + move state15, zero + + REG_L s0, sp, 0 + REG_L s1, sp, SZREG + REG_L s2, sp, SZREG * 2 + REG_L s3, sp, SZREG * 3 + REG_L s4, sp, SZREG * 4 + REG_L s5, sp, SZREG * 5 + REG_L s6, sp, SZREG * 6 + REG_L s7, sp, SZREG * 7 + REG_L s8, sp, SZREG * 8 + REG_L s9, sp, SZREG * 9 + PTR_ADDI sp, sp, -((-SZREG * 10) & STACK_ALIGN) + + jr ra +SYM_FUNC_END(__arch_chacha20_blocks_nostack) diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c new file mode 100644 index 000000000000..68e44b3f1d49 --- /dev/null +++ b/arch/loongarch/vdso/vgetrandom.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved. + */ +#include <linux/types.h> + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, + void *opaque_state, size_t opaque_len); + +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, + void *opaque_state, size_t opaque_len) +{ + return __cvdso_getrandom(buffer, len, flags, opaque_state, + opaque_len); +} diff --git a/tools/arch/loongarch/vdso b/tools/arch/loongarch/vdso new file mode 120000 index 000000000000..ebda43a82db7 --- /dev/null +++ b/tools/arch/loongarch/vdso @@ -0,0 +1 @@ +../../../arch/loongarch/vdso \ No newline at end of file diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile index e21e78aae24d..606ce5f5c2a4 100644 --- a/tools/testing/selftests/vDSO/Makefile +++ b/tools/testing/selftests/vDSO/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 uname_M := $(shell uname -m 2>/dev/null || echo not) -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e /loongarch/s/[0-9]//g) TEST_GEN_PROGS := vdso_test_gettimeofday TEST_GEN_PROGS += vdso_test_getcpu @@ -10,7 +10,7 @@ ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64)) TEST_GEN_PROGS += vdso_standalone_test_x86 endif TEST_GEN_PROGS += vdso_test_correctness -ifeq ($(uname_M),x86_64) +ifeq ($(uname_M),$(filter $(uname_M),x86_64 loongarch64)) TEST_GEN_PROGS += vdso_test_getrandom TEST_GEN_PROGS += vdso_test_chacha endif
Hook up the generic vDSO implementation to the LoongArch vDSO data page by providing the required __arch_chacha20_blocks_nostack, __arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also enable the vDSO getrandom tests for LoongArch: create the symlink to the arch/loongarch/vdso directory, and correctly set the ARCH variable for LoongArch. Signed-off-by: Xi Ruoyao <xry111@xry111.site> --- Cc: linux-crypto@vger.kernel.org Cc: loongarch@lists.linux.dev Cc: linux-kernel@vger.kernel.org Cc: Jinyang He <hejinyang@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Passed vdso_test_getrandom and vdso_test_chacha tests. Benchmark results: vdso_test_getrandom bench-single: vdso: 25000000 times in 0.499490289 seconds libc: 25000000 times in 6.963829873 seconds syscall: 25000000 times in 6.983413486 seconds vdso_test_getrandom bench-multi: vdso: 25000000 x 256 times in 28.703710823 seconds libc: 25000000 x 256 times in 356.835801784 seconds syscall: 25000000 x 256 times in 338.525837197 seconds [v4]->v5: - Rebase onto crng/random.git: - Remove two selftest patches. - Remove __arch_chacha20_blocks_nostack forward declaration. - Squash the remaining selftest patch into the vDSO getrandom implementation patch. - Remove ifdef CONFIG_VDSO_GETRANDOM and $(CONFIG_VDSO_GETRANDOM) as they are always true in arch/loongarch. - Remove asm-offsets.c change which has been already unneeded in v4. - Add comment about rematerializing the constant in the assembly code. - Add prototype for __vdso_getrandom to silence a -Wmissing-prototype warning. [v3]->v4: - Remove LSX implementation, which isn't much faster than the generic implementaion. - Rebase onto crng/random.git: - Define __arch_get_k_vdso_rng_data instead of using inline asm to provide the _vdso_rng_data symbol in a magic way. - Remove memset.S. - Use c-getrandom-y to easily include the generic C code. - The benchmark results seem better than v3, maybe related to the TLS refactoring in random.git. - Add patches for selftests. [v2]->v3: - Add a generic LoongArch implementation for which LSX isn't needed. v1->v2: - Properly send the series to the list. [v4]:https://lore.kernel.org/all/20240827132018.88854-1-xry111@xry111.site/ [v3]:https://lore.kernel.org/all/20240816110717.10249-1-xry111@xry111.site/ [v2]:https://lore.kernel.org/all/20240815133357.35829-1-xry111@xry111.site/ arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/vdso/getrandom.h | 44 ++++ arch/loongarch/include/asm/vdso/vdso.h | 6 + arch/loongarch/include/asm/vdso/vsyscall.h | 8 + arch/loongarch/kernel/asm-offsets.c | 1 + arch/loongarch/kernel/vdso.c | 2 + arch/loongarch/vdso/Makefile | 7 +- arch/loongarch/vdso/vdso.lds.S | 1 + arch/loongarch/vdso/vgetrandom-chacha.S | 242 ++++++++++++++++++++ arch/loongarch/vdso/vgetrandom.c | 15 ++ tools/arch/loongarch/vdso | 1 + tools/testing/selftests/vDSO/Makefile | 4 +- 12 files changed, 329 insertions(+), 3 deletions(-) create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S create mode 100644 arch/loongarch/vdso/vgetrandom.c create mode 120000 tools/arch/loongarch/vdso base-commit: ec309dd126280ca4ef4087e02246a6f61f36c8d1