Message ID | 20240829201728.2825-1-adhemerval.zanella@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | [v2] aarch64: vdso: Wire up getrandom() vDSO implementation | expand |
Hi Catalin, Will, Adhemerval, On Thu, Aug 29, 2024 at 08:17:14PM +0000, Adhemerval Zanella wrote: > Hook up the generic vDSO implementation to the aarch64 vDSO data page. > The _vdso_rng_data required data is placed within the _vdso_data vvar > page, by using a offset larger than the vdso_data. > > The vDSO function requires a ChaCha20 implementation that does not > write to the stack, and that can do an entire ChaCha20 permutation. > The one provided is based on the current chacha-neon-core.S and uses NEON > on the permute operation. The fallback for chips that do not support > NEON issues the syscall. > > This also passes the vdso_test_chacha test along with > vdso_test_getrandom. The vdso_test_getrandom bench-single result on > Neoverse-N1 shows: > > vdso: 25000000 times in 0.746506464 seconds > libc: 25000000 times in 8.849179444 seconds > syscall: 25000000 times in 8.818726425 seconds Aside from the big endian concerns we discussed on IRC, this is looking fine to me, and I'd like to get some variant of this queued up in my random.git tree for 6.12 soon. But first, Catalin or Will -- could one of you take a look and provide your Acked-by for that, if the patch looks good to you? Thanks, Jason
On Thu, Aug 29, 2024 at 08:17:14PM +0000, Adhemerval Zanella wrote: > Hook up the generic vDSO implementation to the aarch64 vDSO data page. > The _vdso_rng_data required data is placed within the _vdso_data vvar > page, by using a offset larger than the vdso_data. > > The vDSO function requires a ChaCha20 implementation that does not > write to the stack, and that can do an entire ChaCha20 permutation. > The one provided is based on the current chacha-neon-core.S and uses NEON > on the permute operation. The fallback for chips that do not support > NEON issues the syscall. > > This also passes the vdso_test_chacha test along with > vdso_test_getrandom. The vdso_test_getrandom bench-single result on > Neoverse-N1 shows: > > vdso: 25000000 times in 0.746506464 seconds > libc: 25000000 times in 8.849179444 seconds > syscall: 25000000 times in 8.818726425 seconds > > Changes from v1: > - Fixed style issues and typos. > - Added fallback for systems without NEON support. > - Avoid use of non-volatile vector registers in neon chacha20. > - Use c-getrandom-y for vgetrandom.c. > - Fixed TIMENS vdso_rnd_data access. > > Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > --- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/vdso.h | 6 + > arch/arm64/include/asm/vdso/getrandom.h | 49 ++++++ > arch/arm64/include/asm/vdso/vsyscall.h | 10 ++ > arch/arm64/kernel/vdso.c | 6 - > arch/arm64/kernel/vdso/Makefile | 11 +- > arch/arm64/kernel/vdso/vdso | 1 + > arch/arm64/kernel/vdso/vdso.lds.S | 4 + > arch/arm64/kernel/vdso/vgetrandom-chacha.S | 168 +++++++++++++++++++++ > arch/arm64/kernel/vdso/vgetrandom.c | 15 ++ > lib/vdso/getrandom.c | 1 + > tools/arch/arm64/vdso | 1 + > tools/include/linux/compiler.h | 4 + > tools/testing/selftests/vDSO/Makefile | 5 +- Please can you split the tools/ changes into a separate patch? > 14 files changed, 273 insertions(+), 9 deletions(-) > create mode 100644 arch/arm64/include/asm/vdso/getrandom.h > create mode 120000 arch/arm64/kernel/vdso/vdso > create mode 100644 arch/arm64/kernel/vdso/vgetrandom-chacha.S > create mode 100644 arch/arm64/kernel/vdso/vgetrandom.c > create mode 120000 tools/arch/arm64/vdso > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2f8ff354ca6..7f7424d1b3b8 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -262,6 +262,7 @@ config ARM64 > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > select USER_STACKTRACE_SUPPORT > + select VDSO_GETRANDOM > help > ARM 64-bit (AArch64) Linux support. > > diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h > index 4305995c8f82..18407b757c95 100644 > --- a/arch/arm64/include/asm/vdso.h > +++ b/arch/arm64/include/asm/vdso.h > @@ -16,6 +16,12 @@ > > #ifndef __ASSEMBLY__ > > +enum vvar_pages { > + VVAR_DATA_PAGE_OFFSET, > + VVAR_TIMENS_PAGE_OFFSET, > + VVAR_NR_PAGES, > +}; > + > #include <generated/vdso-offsets.h> > > #define VDSO_SYMBOL(base, name) \ > diff --git a/arch/arm64/include/asm/vdso/getrandom.h b/arch/arm64/include/asm/vdso/getrandom.h > new file mode 100644 > index 000000000000..fca66ba49d4c > --- /dev/null > +++ b/arch/arm64/include/asm/vdso/getrandom.h > @@ -0,0 +1,49 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > + > +#ifndef __ASM_VDSO_GETRANDOM_H > +#define __ASM_VDSO_GETRANDOM_H > + > +#ifndef __ASSEMBLY__ > + > +#include <asm/vdso.h> > +#include <asm/unistd.h> > +#include <vdso/datapage.h> > + > +/** > + * getrandom_syscall - Invoke the getrandom() syscall. > + * @buffer: Destination buffer to fill with random bytes. > + * @len: Size of @buffer in bytes. > + * @flags: Zero or more GRND_* flags. > + * Returns: The number of random bytes written to @buffer, or a negative value indicating an error. > + */ > +static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, unsigned int _flags) > +{ > + register void *buffer asm ("x0") = _buffer; > + register size_t len asm ("x1") = _len; > + register unsigned int flags asm ("x2") = _flags; > + register long ret asm ("x0"); > + register long nr asm ("x8") = __NR_getrandom; > + > + asm volatile( > + " svc #0\n" > + : "=r" (ret) > + : "r" (buffer), "r" (len), "r" (flags), "r" (nr) > + : "memory"); > + > + return ret; > +} > + > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) > +{ > + /* > + * If a task belongs to a time namespace then a namespace the real > + * VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET. > + */ This comment doesn't make sense. > + if (IS_ENABLED(CONFIG_TIME_NS) && _vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) > + return (void*)&_vdso_rng_data + VVAR_TIMENS_PAGE_OFFSET * PAGE_SIZE; > + return &_vdso_rng_data; > +} > + > +#endif /* !__ASSEMBLY__ */ > + > +#endif /* __ASM_VDSO_GETRANDOM_H */ > diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h > index f94b1457c117..2a87f0e1b144 100644 > --- a/arch/arm64/include/asm/vdso/vsyscall.h > +++ b/arch/arm64/include/asm/vdso/vsyscall.h > @@ -2,8 +2,11 @@ > #ifndef __ASM_VDSO_VSYSCALL_H > #define __ASM_VDSO_VSYSCALL_H > > +#define __VDSO_RND_DATA_OFFSET 480 Why 480? > + > #ifndef __ASSEMBLY__ > > +#include <asm/vdso.h> > #include <linux/timekeeper_internal.h> > #include <vdso/datapage.h> > > @@ -21,6 +24,13 @@ struct vdso_data *__arm64_get_k_vdso_data(void) > } > #define __arch_get_k_vdso_data __arm64_get_k_vdso_data > > +static __always_inline > +struct vdso_rng_data *__arm64_get_k_vdso_rnd_data(void) > +{ > + return (void*)vdso_data + __VDSO_RND_DATA_OFFSET; > +} > +#define __arch_get_k_vdso_rng_data __arm64_get_k_vdso_rnd_data > + > static __always_inline > void __arm64_update_vsyscall(struct vdso_data *vdata, struct timekeeper *tk) > { > diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c > index 89b6e7840002..706c9c3a7a50 100644 > --- a/arch/arm64/kernel/vdso.c > +++ b/arch/arm64/kernel/vdso.c > @@ -34,12 +34,6 @@ enum vdso_abi { > VDSO_ABI_AA32, > }; > > -enum vvar_pages { > - VVAR_DATA_PAGE_OFFSET, > - VVAR_TIMENS_PAGE_OFFSET, > - VVAR_NR_PAGES, > -}; > - > struct vdso_abi_info { > const char *name; > const char *vdso_code_start; > diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile > index d11da6461278..50246a38d6bd 100644 > --- a/arch/arm64/kernel/vdso/Makefile > +++ b/arch/arm64/kernel/vdso/Makefile > @@ -9,7 +9,7 @@ > # Include the generic Makefile to check the built vdso. > include $(srctree)/lib/vdso/Makefile > > -obj-vdso := vgettimeofday.o note.o sigreturn.o > +obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o > > # Build rules > targets := $(obj-vdso) vdso.so vdso.so.dbg > @@ -40,13 +40,22 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ > $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ > $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ > -Wmissing-prototypes -Wmissing-declarations > +CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ > + $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ > + $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ > + -Wmissing-prototypes -Wmissing-declarations > > CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables > +CFLAGS_vgetrandom.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables You're using identical CFLAGS_ and CFLAGS_REMOVE_ definitions for vgettimeofdat.o and vgetrandom.o. Please refactor this so that they use common definitions. > diff --git a/arch/arm64/kernel/vdso/vdso b/arch/arm64/kernel/vdso/vdso > new file mode 120000 > index 000000000000..233c7a26f6e5 > --- /dev/null > +++ b/arch/arm64/kernel/vdso/vdso > @@ -0,0 +1 @@ > +../../../arch/arm64/kernel/vdso > \ No newline at end of file > diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S > index 45354f2ddf70..f204a9ddc833 100644 > --- a/arch/arm64/kernel/vdso/vdso.lds.S > +++ b/arch/arm64/kernel/vdso/vdso.lds.S > @@ -11,7 +11,9 @@ > #include <linux/const.h> > #include <asm/page.h> > #include <asm/vdso.h> > +#include <asm/vdso/vsyscall.h> > #include <asm-generic/vmlinux.lds.h> > +#include <vdso/datapage.h> > > OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64") > OUTPUT_ARCH(aarch64) > @@ -19,6 +21,7 @@ OUTPUT_ARCH(aarch64) > SECTIONS > { > PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE); > + PROVIDE(_vdso_rng_data = _vdso_data + __VDSO_RND_DATA_OFFSET); > #ifdef CONFIG_TIME_NS > PROVIDE(_timens_data = _vdso_data + PAGE_SIZE); > #endif > @@ -102,6 +105,7 @@ VERSION > __kernel_gettimeofday; > __kernel_clock_gettime; > __kernel_clock_getres; > + __kernel_getrandom; > local: *; > }; > } > diff --git a/arch/arm64/kernel/vdso/vgetrandom-chacha.S b/arch/arm64/kernel/vdso/vgetrandom-chacha.S > new file mode 100644 > index 000000000000..9ebf12a09c65 > --- /dev/null > +++ b/arch/arm64/kernel/vdso/vgetrandom-chacha.S > @@ -0,0 +1,168 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <linux/linkage.h> > +#include <asm/cache.h> > +#include <asm/assembler.h> > + > + .text > + > +#define state0 v0 > +#define state1 v1 > +#define state2 v2 > +#define state3 v3 > +#define copy0 v4 > +#define copy1 v5 > +#define copy2 v6 > +#define copy3 v7 > +#define copy3_d d7 > +#define one_d d16 > +#define one_q q16 > +#define tmp v17 > +#define rot8 v18 > + > +/* > + * ARM64 ChaCha20 implementation meant for vDSO. Produces a given positive > + * number of blocks of output with nonce 0, taking an input key and 8-bytes > + * counter. Importantly does not spill to the stack. > + * > + * void __arch_chacha20_blocks_nostack(uint8_t *dst_bytes, > + * const uint8_t *key, > + * uint32_t *counter, > + * size_t nblocks) > + * > + * x0: output bytes > + * x1: 32-byte key input > + * x2: 8-byte counter input/output > + * x3: number of 64-byte block to write to output > + */ > +SYM_FUNC_START(__arch_chacha20_blocks_nostack) Is there any way we can reuse the existing code in crypto/chacha-neon-core.S for this? It looks to my untrained eye like this is an arbitrarily different implementation to what we already have. > + /* copy0 = "expand 32-byte k" */ > + adr_l x8, CTES > + ld1 {copy0.4s}, [x8] > + /* copy1,copy2 = key */ > + ld1 { copy1.4s, copy2.4s }, [x1] > + /* copy3 = counter || zero nonce */ > + ldr copy3_d, [x2] > + > + adr_l x8, ONE > + ldr one_q, [x8] > + > + adr_l x10, ROT8 > + ld1 {rot8.4s}, [x10] > +.Lblock: > + /* copy state to auxiliary vectors for the final add after the permute. */ > + mov state0.16b, copy0.16b > + mov state1.16b, copy1.16b > + mov state2.16b, copy2.16b > + mov state3.16b, copy3.16b > + > + mov w4, 20 > +.Lpermute: > + /* > + * Permute one 64-byte block where the state matrix is stored in the four NEON > + * registers state0-state3. It performs matrix operations on four words in parallel, > + * but requires shuffling to rearrange the words after each round. > + */ > + > +.Ldoubleround: > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + rev32 state3.8h, state3.8h > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #12 > + sri state1.4s, tmp.4s, #20 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + tbl state3.16b, {state3.16b}, rot8.16b > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #7 > + sri state1.4s, tmp.4s, #25 > + > + /* state1[0,1,2,3] = state1[1,2,3,0] */ > + ext state1.16b, state1.16b, state1.16b, #4 > + /* state2[0,1,2,3] = state2[2,3,0,1] */ > + ext state2.16b, state2.16b, state2.16b, #8 > + /* state3[0,1,2,3] = state3[1,2,3,0] */ > + ext state3.16b, state3.16b, state3.16b, #12 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + rev32 state3.8h, state3.8h > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #12 > + sri state1.4s, tmp.4s, #20 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + tbl state3.16b, {state3.16b}, rot8.16b > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #7 > + sri state1.4s, tmp.4s, #25 > + > + /* state1[0,1,2,3] = state1[3,0,1,2] */ > + ext state1.16b, state1.16b, state1.16b, #12 > + /* state2[0,1,2,3] = state2[2,3,0,1] */ > + ext state2.16b, state2.16b, state2.16b, #8 > + /* state3[0,1,2,3] = state3[1,2,3,0] */ > + ext state3.16b, state3.16b, state3.16b, #4 > + > + subs w4, w4, #2 > + b.ne .Ldoubleround > + > + /* output0 = state0 + state0 */ > + add state0.4s, state0.4s, copy0.4s > + /* output1 = state1 + state1 */ > + add state1.4s, state1.4s, copy1.4s > + /* output2 = state2 + state2 */ > + add state2.4s, state2.4s, copy2.4s > + /* output2 = state3 + state3 */ > + add state3.4s, state3.4s, copy3.4s > + st1 { state0.4s - state3.4s }, [x0] > + > + /* ++copy3.counter */ > + add copy3_d, copy3_d, one_d > + > + /* output += 64, --nblocks */ > + add x0, x0, 64 > + subs x3, x3, #1 > + b.ne .Lblock > + > + /* counter = copy3.counter */ > + str copy3_d, [x2] > + > + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ > + eor state0.16b, state0.16b, state0.16b > + eor state1.16b, state1.16b, state1.16b > + eor state2.16b, state2.16b, state2.16b > + eor state3.16b, state3.16b, state3.16b > + eor copy1.16b, copy1.16b, copy1.16b > + eor copy2.16b, copy2.16b, copy2.16b > + ret > +SYM_FUNC_END(__arch_chacha20_blocks_nostack) > + > + .section ".rodata", "a", %progbits > + .align L1_CACHE_SHIFT > + > +CTES: .word 1634760805, 857760878, 2036477234, 1797285236 > +ONE: .xword 1, 0 > +ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f > + > +emit_aarch64_feature_1_and > diff --git a/arch/arm64/kernel/vdso/vgetrandom.c b/arch/arm64/kernel/vdso/vgetrandom.c > new file mode 100644 > index 000000000000..0833d25f3121 > --- /dev/null > +++ b/arch/arm64/kernel/vdso/vgetrandom.c > @@ -0,0 +1,15 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +typeof(__cvdso_getrandom) __kernel_getrandom; > + > +ssize_t __kernel_getrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len) > +{ > + asm goto ( > + ALTERNATIVE("b %[fallback]", "nop", RM64_HAS_FPSIMD) : : : : fallback); "RM64_HAS_FPSIMD". Are you sure you've tested this? > + return __cvdso_getrandom(buffer, len, flags, opaque_state, opaque_len); > + > +fallback: > + if (unlikely(opaque_len == ~0UL && !buffer && !len && !flags)) > + return -ENOSYS; > + return getrandom_syscall(buffer, len, flags); > +} > diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c > index 938ca539aaa6..7c9711248d9b 100644 > --- a/lib/vdso/getrandom.c > +++ b/lib/vdso/getrandom.c > @@ -5,6 +5,7 @@ > > #include <linux/array_size.h> > #include <linux/minmax.h> > +#include <linux/mm.h> > #include <vdso/datapage.h> > #include <vdso/getrandom.h> > #include <vdso/unaligned.h> Looks like this should be a separate change? Will
> > +SYM_FUNC_START(__arch_chacha20_blocks_nostack) > > Is there any way we can reuse the existing code in > crypto/chacha-neon-core.S for this? It looks to my untrained eye like > this is an arbitrarily different implementation to what we already have. Nope, it is indeed different, and not arbitrarily so. This patch is mirroring exactly what we did on x86. Jason
On 30/08/24 08:46, Will Deacon wrote: > On Thu, Aug 29, 2024 at 08:17:14PM +0000, Adhemerval Zanella wrote: >> Hook up the generic vDSO implementation to the aarch64 vDSO data page. >> The _vdso_rng_data required data is placed within the _vdso_data vvar >> page, by using a offset larger than the vdso_data. >> >> The vDSO function requires a ChaCha20 implementation that does not >> write to the stack, and that can do an entire ChaCha20 permutation. >> The one provided is based on the current chacha-neon-core.S and uses NEON >> on the permute operation. The fallback for chips that do not support >> NEON issues the syscall. >> >> This also passes the vdso_test_chacha test along with >> vdso_test_getrandom. The vdso_test_getrandom bench-single result on >> Neoverse-N1 shows: >> >> vdso: 25000000 times in 0.746506464 seconds >> libc: 25000000 times in 8.849179444 seconds >> syscall: 25000000 times in 8.818726425 seconds >> >> Changes from v1: >> - Fixed style issues and typos. >> - Added fallback for systems without NEON support. >> - Avoid use of non-volatile vector registers in neon chacha20. >> - Use c-getrandom-y for vgetrandom.c. >> - Fixed TIMENS vdso_rnd_data access. >> >> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> >> --- >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/vdso.h | 6 + >> arch/arm64/include/asm/vdso/getrandom.h | 49 ++++++ >> arch/arm64/include/asm/vdso/vsyscall.h | 10 ++ >> arch/arm64/kernel/vdso.c | 6 - >> arch/arm64/kernel/vdso/Makefile | 11 +- >> arch/arm64/kernel/vdso/vdso | 1 + >> arch/arm64/kernel/vdso/vdso.lds.S | 4 + >> arch/arm64/kernel/vdso/vgetrandom-chacha.S | 168 +++++++++++++++++++++ >> arch/arm64/kernel/vdso/vgetrandom.c | 15 ++ >> lib/vdso/getrandom.c | 1 + >> tools/arch/arm64/vdso | 1 + >> tools/include/linux/compiler.h | 4 + >> tools/testing/selftests/vDSO/Makefile | 5 +- > > Please can you split the tools/ changes into a separate patch? Alright, it would require to be after the inclusion on vgetrandom-chacha.S otherwise vdso_test_chacha will not build on aarch64. > >> 14 files changed, 273 insertions(+), 9 deletions(-) >> create mode 100644 arch/arm64/include/asm/vdso/getrandom.h >> create mode 120000 arch/arm64/kernel/vdso/vdso >> create mode 100644 arch/arm64/kernel/vdso/vgetrandom-chacha.S >> create mode 100644 arch/arm64/kernel/vdso/vgetrandom.c >> create mode 120000 tools/arch/arm64/vdso >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index a2f8ff354ca6..7f7424d1b3b8 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -262,6 +262,7 @@ config ARM64 >> select TRACE_IRQFLAGS_NMI_SUPPORT >> select HAVE_SOFTIRQ_ON_OWN_STACK >> select USER_STACKTRACE_SUPPORT >> + select VDSO_GETRANDOM >> help >> ARM 64-bit (AArch64) Linux support. >> >> diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h >> index 4305995c8f82..18407b757c95 100644 >> --- a/arch/arm64/include/asm/vdso.h >> +++ b/arch/arm64/include/asm/vdso.h >> @@ -16,6 +16,12 @@ >> >> #ifndef __ASSEMBLY__ >> >> +enum vvar_pages { >> + VVAR_DATA_PAGE_OFFSET, >> + VVAR_TIMENS_PAGE_OFFSET, >> + VVAR_NR_PAGES, >> +}; >> + >> #include <generated/vdso-offsets.h> >> >> #define VDSO_SYMBOL(base, name) \ >> diff --git a/arch/arm64/include/asm/vdso/getrandom.h b/arch/arm64/include/asm/vdso/getrandom.h >> new file mode 100644 >> index 000000000000..fca66ba49d4c >> --- /dev/null >> +++ b/arch/arm64/include/asm/vdso/getrandom.h >> @@ -0,0 +1,49 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> + >> +#ifndef __ASM_VDSO_GETRANDOM_H >> +#define __ASM_VDSO_GETRANDOM_H >> + >> +#ifndef __ASSEMBLY__ >> + >> +#include <asm/vdso.h> >> +#include <asm/unistd.h> >> +#include <vdso/datapage.h> >> + >> +/** >> + * getrandom_syscall - Invoke the getrandom() syscall. >> + * @buffer: Destination buffer to fill with random bytes. >> + * @len: Size of @buffer in bytes. >> + * @flags: Zero or more GRND_* flags. >> + * Returns: The number of random bytes written to @buffer, or a negative value indicating an error. >> + */ >> +static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, unsigned int _flags) >> +{ >> + register void *buffer asm ("x0") = _buffer; >> + register size_t len asm ("x1") = _len; >> + register unsigned int flags asm ("x2") = _flags; >> + register long ret asm ("x0"); >> + register long nr asm ("x8") = __NR_getrandom; >> + >> + asm volatile( >> + " svc #0\n" >> + : "=r" (ret) >> + : "r" (buffer), "r" (len), "r" (flags), "r" (nr) >> + : "memory"); >> + >> + return ret; >> +} >> + >> +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) >> +{ >> + /* >> + * If a task belongs to a time namespace then a namespace the real >> + * VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET. >> + */ > > This comment doesn't make sense. I reprased it from arch/arm64/kernel/vdso.c (vvar_fault). Did I confuse something? This is indeed required, otherwise the getrandom vDSO on a timens does not see the generation counter with correctly and fallback to syscall. > >> + if (IS_ENABLED(CONFIG_TIME_NS) && _vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) >> + return (void*)&_vdso_rng_data + VVAR_TIMENS_PAGE_OFFSET * PAGE_SIZE; >> + return &_vdso_rng_data; >> +} >> + >> +#endif /* !__ASSEMBLY__ */ >> + >> +#endif /* __ASM_VDSO_GETRANDOM_H */ >> diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h >> index f94b1457c117..2a87f0e1b144 100644 >> --- a/arch/arm64/include/asm/vdso/vsyscall.h >> +++ b/arch/arm64/include/asm/vdso/vsyscall.h >> @@ -2,8 +2,11 @@ >> #ifndef __ASM_VDSO_VSYSCALL_H >> #define __ASM_VDSO_VSYSCALL_H >> >> +#define __VDSO_RND_DATA_OFFSET 480 > > Why 480? I used the x86 strategy to place the the vdso_rng_data and the vdso_data, but I could not make to fit the vdso_data generation with the linker script machinery required for vdso.lds.S (I think Jason faced a similar issue with x86). I will try to see if I can refactor in a subsequent patch the vdso_data definition to place the vdso_rng_data in a common struct. It does not help that it seems know that each architecture is placing the vdso_rng_data in a different place. > >> + >> #ifndef __ASSEMBLY__ >> >> +#include <asm/vdso.h> >> #include <linux/timekeeper_internal.h> >> #include <vdso/datapage.h> >> >> @@ -21,6 +24,13 @@ struct vdso_data *__arm64_get_k_vdso_data(void) >> } >> #define __arch_get_k_vdso_data __arm64_get_k_vdso_data >> >> +static __always_inline >> +struct vdso_rng_data *__arm64_get_k_vdso_rnd_data(void) >> +{ >> + return (void*)vdso_data + __VDSO_RND_DATA_OFFSET; >> +} >> +#define __arch_get_k_vdso_rng_data __arm64_get_k_vdso_rnd_data >> + >> static __always_inline >> void __arm64_update_vsyscall(struct vdso_data *vdata, struct timekeeper *tk) >> { >> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c >> index 89b6e7840002..706c9c3a7a50 100644 >> --- a/arch/arm64/kernel/vdso.c >> +++ b/arch/arm64/kernel/vdso.c >> @@ -34,12 +34,6 @@ enum vdso_abi { >> VDSO_ABI_AA32, >> }; >> >> -enum vvar_pages { >> - VVAR_DATA_PAGE_OFFSET, >> - VVAR_TIMENS_PAGE_OFFSET, >> - VVAR_NR_PAGES, >> -}; >> - >> struct vdso_abi_info { >> const char *name; >> const char *vdso_code_start; >> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile >> index d11da6461278..50246a38d6bd 100644 >> --- a/arch/arm64/kernel/vdso/Makefile >> +++ b/arch/arm64/kernel/vdso/Makefile >> @@ -9,7 +9,7 @@ >> # Include the generic Makefile to check the built vdso. >> include $(srctree)/lib/vdso/Makefile >> >> -obj-vdso := vgettimeofday.o note.o sigreturn.o >> +obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o >> >> # Build rules >> targets := $(obj-vdso) vdso.so vdso.so.dbg >> @@ -40,13 +40,22 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ >> $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ >> $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ >> -Wmissing-prototypes -Wmissing-declarations >> +CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ >> + $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ >> + $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ >> + -Wmissing-prototypes -Wmissing-declarations >> >> CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables >> +CFLAGS_vgetrandom.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables > > You're using identical CFLAGS_ and CFLAGS_REMOVE_ definitions for > vgettimeofdat.o and vgetrandom.o. Please refactor this so that they use > common definitions. Ack. > >> diff --git a/arch/arm64/kernel/vdso/vdso b/arch/arm64/kernel/vdso/vdso >> new file mode 120000 >> index 000000000000..233c7a26f6e5 >> --- /dev/null >> +++ b/arch/arm64/kernel/vdso/vdso >> @@ -0,0 +1 @@ >> +../../../arch/arm64/kernel/vdso >> \ No newline at end of file >> diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S >> index 45354f2ddf70..f204a9ddc833 100644 >> --- a/arch/arm64/kernel/vdso/vdso.lds.S >> +++ b/arch/arm64/kernel/vdso/vdso.lds.S >> @@ -11,7 +11,9 @@ >> #include <linux/const.h> >> #include <asm/page.h> >> #include <asm/vdso.h> >> +#include <asm/vdso/vsyscall.h> >> #include <asm-generic/vmlinux.lds.h> >> +#include <vdso/datapage.h> >> >> OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64") >> OUTPUT_ARCH(aarch64) >> @@ -19,6 +21,7 @@ OUTPUT_ARCH(aarch64) >> SECTIONS >> { >> PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE); >> + PROVIDE(_vdso_rng_data = _vdso_data + __VDSO_RND_DATA_OFFSET); >> #ifdef CONFIG_TIME_NS >> PROVIDE(_timens_data = _vdso_data + PAGE_SIZE); >> #endif >> @@ -102,6 +105,7 @@ VERSION >> __kernel_gettimeofday; >> __kernel_clock_gettime; >> __kernel_clock_getres; >> + __kernel_getrandom; >> local: *; >> }; >> } >> diff --git a/arch/arm64/kernel/vdso/vgetrandom-chacha.S b/arch/arm64/kernel/vdso/vgetrandom-chacha.S >> new file mode 100644 >> index 000000000000..9ebf12a09c65 >> --- /dev/null >> +++ b/arch/arm64/kernel/vdso/vgetrandom-chacha.S >> @@ -0,0 +1,168 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> + >> +#include <linux/linkage.h> >> +#include <asm/cache.h> >> +#include <asm/assembler.h> >> + >> + .text >> + >> +#define state0 v0 >> +#define state1 v1 >> +#define state2 v2 >> +#define state3 v3 >> +#define copy0 v4 >> +#define copy1 v5 >> +#define copy2 v6 >> +#define copy3 v7 >> +#define copy3_d d7 >> +#define one_d d16 >> +#define one_q q16 >> +#define tmp v17 >> +#define rot8 v18 >> + >> +/* >> + * ARM64 ChaCha20 implementation meant for vDSO. Produces a given positive >> + * number of blocks of output with nonce 0, taking an input key and 8-bytes >> + * counter. Importantly does not spill to the stack. >> + * >> + * void __arch_chacha20_blocks_nostack(uint8_t *dst_bytes, >> + * const uint8_t *key, >> + * uint32_t *counter, >> + * size_t nblocks) >> + * >> + * x0: output bytes >> + * x1: 32-byte key input >> + * x2: 8-byte counter input/output >> + * x3: number of 64-byte block to write to output >> + */ >> +SYM_FUNC_START(__arch_chacha20_blocks_nostack) > > Is there any way we can reuse the existing code in > crypto/chacha-neon-core.S for this? It looks to my untrained eye like > this is an arbitrarily different implementation to what we already have. > >> + /* copy0 = "expand 32-byte k" */ >> + adr_l x8, CTES >> + ld1 {copy0.4s}, [x8] >> + /* copy1,copy2 = key */ >> + ld1 { copy1.4s, copy2.4s }, [x1] >> + /* copy3 = counter || zero nonce */ >> + ldr copy3_d, [x2] >> + >> + adr_l x8, ONE >> + ldr one_q, [x8] >> + >> + adr_l x10, ROT8 >> + ld1 {rot8.4s}, [x10] >> +.Lblock: >> + /* copy state to auxiliary vectors for the final add after the permute. */ >> + mov state0.16b, copy0.16b >> + mov state1.16b, copy1.16b >> + mov state2.16b, copy2.16b >> + mov state3.16b, copy3.16b >> + >> + mov w4, 20 >> +.Lpermute: >> + /* >> + * Permute one 64-byte block where the state matrix is stored in the four NEON >> + * registers state0-state3. It performs matrix operations on four words in parallel, >> + * but requires shuffling to rearrange the words after each round. >> + */ >> + >> +.Ldoubleround: >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + rev32 state3.8h, state3.8h >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #12 >> + sri state1.4s, tmp.4s, #20 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + tbl state3.16b, {state3.16b}, rot8.16b >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #7 >> + sri state1.4s, tmp.4s, #25 >> + >> + /* state1[0,1,2,3] = state1[1,2,3,0] */ >> + ext state1.16b, state1.16b, state1.16b, #4 >> + /* state2[0,1,2,3] = state2[2,3,0,1] */ >> + ext state2.16b, state2.16b, state2.16b, #8 >> + /* state3[0,1,2,3] = state3[1,2,3,0] */ >> + ext state3.16b, state3.16b, state3.16b, #12 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + rev32 state3.8h, state3.8h >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #12 >> + sri state1.4s, tmp.4s, #20 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + tbl state3.16b, {state3.16b}, rot8.16b >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #7 >> + sri state1.4s, tmp.4s, #25 >> + >> + /* state1[0,1,2,3] = state1[3,0,1,2] */ >> + ext state1.16b, state1.16b, state1.16b, #12 >> + /* state2[0,1,2,3] = state2[2,3,0,1] */ >> + ext state2.16b, state2.16b, state2.16b, #8 >> + /* state3[0,1,2,3] = state3[1,2,3,0] */ >> + ext state3.16b, state3.16b, state3.16b, #4 >> + >> + subs w4, w4, #2 >> + b.ne .Ldoubleround >> + >> + /* output0 = state0 + state0 */ >> + add state0.4s, state0.4s, copy0.4s >> + /* output1 = state1 + state1 */ >> + add state1.4s, state1.4s, copy1.4s >> + /* output2 = state2 + state2 */ >> + add state2.4s, state2.4s, copy2.4s >> + /* output2 = state3 + state3 */ >> + add state3.4s, state3.4s, copy3.4s >> + st1 { state0.4s - state3.4s }, [x0] >> + >> + /* ++copy3.counter */ >> + add copy3_d, copy3_d, one_d >> + >> + /* output += 64, --nblocks */ >> + add x0, x0, 64 >> + subs x3, x3, #1 >> + b.ne .Lblock >> + >> + /* counter = copy3.counter */ >> + str copy3_d, [x2] >> + >> + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ >> + eor state0.16b, state0.16b, state0.16b >> + eor state1.16b, state1.16b, state1.16b >> + eor state2.16b, state2.16b, state2.16b >> + eor state3.16b, state3.16b, state3.16b >> + eor copy1.16b, copy1.16b, copy1.16b >> + eor copy2.16b, copy2.16b, copy2.16b >> + ret >> +SYM_FUNC_END(__arch_chacha20_blocks_nostack) >> + >> + .section ".rodata", "a", %progbits >> + .align L1_CACHE_SHIFT >> + >> +CTES: .word 1634760805, 857760878, 2036477234, 1797285236 >> +ONE: .xword 1, 0 >> +ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f >> + >> +emit_aarch64_feature_1_and >> diff --git a/arch/arm64/kernel/vdso/vgetrandom.c b/arch/arm64/kernel/vdso/vgetrandom.c >> new file mode 100644 >> index 000000000000..0833d25f3121 >> --- /dev/null >> +++ b/arch/arm64/kernel/vdso/vgetrandom.c >> @@ -0,0 +1,15 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> + >> +typeof(__cvdso_getrandom) __kernel_getrandom; >> + >> +ssize_t __kernel_getrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len) >> +{ >> + asm goto ( >> + ALTERNATIVE("b %[fallback]", "nop", RM64_HAS_FPSIMD) : : : : fallback); > > "RM64_HAS_FPSIMD". Are you sure you've tested this? I am not sure why build has not failed (I double test and it does not generate a wrong relocation) or why vdso does seems to have the nop in the expected place. I have changed to ARM64_HAS_FPSIMD. > >> + return __cvdso_getrandom(buffer, len, flags, opaque_state, opaque_len); >> + >> +fallback: >> + if (unlikely(opaque_len == ~0UL && !buffer && !len && !flags)) >> + return -ENOSYS; >> + return getrandom_syscall(buffer, len, flags); >> +} >> diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c >> index 938ca539aaa6..7c9711248d9b 100644 >> --- a/lib/vdso/getrandom.c >> +++ b/lib/vdso/getrandom.c >> @@ -5,6 +5,7 @@ >> >> #include <linux/array_size.h> >> #include <linux/minmax.h> >> +#include <linux/mm.h> >> #include <vdso/datapage.h> >> #include <vdso/getrandom.h> >> #include <vdso/unaligned.h> > > Looks like this should be a separate change? It is required so arm64 can use c-getrandom-y, otherwise vgetrandom.o build fails: CC arch/arm64/kernel/vdso/vgetrandom.o In file included from ./include/uapi/linux/mman.h:5, from /mnt/projects/linux/linux-git/lib/vdso/getrandom.c:13, from <command-line>: ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_prot_bits’: ./arch/arm64/include/asm/mman.h:14:13: error: implicit declaration of function ‘system_supports_bti’ [-Werror=implicit-function-declaration] 14 | if (system_supports_bti() && (prot & PROT_BTI)) | ^~~~~~~~~~~~~~~~~~~ ./arch/arm64/include/asm/mman.h:15:24: error: ‘VM_ARM64_BTI’ undeclared (first use in this function); did you mean ‘ARM64_BTI’? 15 | ret |= VM_ARM64_BTI; | ^~~~~~~~~~~~ | ARM64_BTI ./arch/arm64/include/asm/mman.h:15:24: note: each undeclared identifier is reported only once for each function it appears in ./arch/arm64/include/asm/mman.h:17:13: error: implicit declaration of function ‘system_supports_mte’ [-Werror=implicit-function-declaration] 17 | if (system_supports_mte() && (prot & PROT_MTE)) | ^~~~~~~~~~~~~~~~~~~ ./arch/arm64/include/asm/mman.h:18:24: error: ‘VM_MTE’ undeclared (first use in this function) 18 | ret |= VM_MTE; | ^~~~~~ ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_flag_bits’: ./arch/arm64/include/asm/mman.h:32:24: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) 32 | return VM_MTE_ALLOWED; | ^~~~~~~~~~~~~~ ./arch/arm64/include/asm/mman.h: In function ‘arch_validate_flags’: ./arch/arm64/include/asm/mman.h:59:29: error: ‘VM_MTE’ undeclared (first use in this function) 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); | ^~~~~~ ./arch/arm64/include/asm/mman.h:59:52: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); | ^~~~~~~~~~~~~~ arch/arm64/kernel/vdso/vgetrandom.c: In function ‘__kernel_getrandom’: arch/arm64/kernel/vdso/vgetrandom.c:18:25: error: ‘ENOSYS’ undeclared (first use in this function); did you mean ‘ENOSPC’? 18 | return -ENOSYS; | ^~~~~~ | ENOSPC cc1: some warnings being treated as errors I can move to a different patch, but this is really tied to this patch.
On Thu, 29 Aug 2024 at 22:17, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote: > > Hook up the generic vDSO implementation to the aarch64 vDSO data page. > The _vdso_rng_data required data is placed within the _vdso_data vvar > page, by using a offset larger than the vdso_data. > > The vDSO function requires a ChaCha20 implementation that does not > write to the stack, and that can do an entire ChaCha20 permutation. > The one provided is based on the current chacha-neon-core.S and uses NEON > on the permute operation. The fallback for chips that do not support > NEON issues the syscall. > > This also passes the vdso_test_chacha test along with > vdso_test_getrandom. The vdso_test_getrandom bench-single result on > Neoverse-N1 shows: > > vdso: 25000000 times in 0.746506464 seconds > libc: 25000000 times in 8.849179444 seconds > syscall: 25000000 times in 8.818726425 seconds > > Changes from v1: > - Fixed style issues and typos. > - Added fallback for systems without NEON support. > - Avoid use of non-volatile vector registers in neon chacha20. > - Use c-getrandom-y for vgetrandom.c. > - Fixed TIMENS vdso_rnd_data access. > > Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > --- ... > diff --git a/arch/arm64/kernel/vdso/vgetrandom-chacha.S b/arch/arm64/kernel/vdso/vgetrandom-chacha.S > new file mode 100644 > index 000000000000..9ebf12a09c65 > --- /dev/null > +++ b/arch/arm64/kernel/vdso/vgetrandom-chacha.S > @@ -0,0 +1,168 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <linux/linkage.h> > +#include <asm/cache.h> > +#include <asm/assembler.h> > + > + .text > + > +#define state0 v0 > +#define state1 v1 > +#define state2 v2 > +#define state3 v3 > +#define copy0 v4 > +#define copy1 v5 > +#define copy2 v6 > +#define copy3 v7 > +#define copy3_d d7 > +#define one_d d16 > +#define one_q q16 > +#define tmp v17 > +#define rot8 v18 > + Please make a note somewhere around here that you are deliberately avoiding d8-d15 because they are callee-save in user space. > +/* > + * ARM64 ChaCha20 implementation meant for vDSO. Produces a given positive > + * number of blocks of output with nonce 0, taking an input key and 8-bytes > + * counter. Importantly does not spill to the stack. > + * > + * void __arch_chacha20_blocks_nostack(uint8_t *dst_bytes, > + * const uint8_t *key, > + * uint32_t *counter, > + * size_t nblocks) > + * > + * x0: output bytes > + * x1: 32-byte key input > + * x2: 8-byte counter input/output > + * x3: number of 64-byte block to write to output > + */ > +SYM_FUNC_START(__arch_chacha20_blocks_nostack) > + > + /* copy0 = "expand 32-byte k" */ > + adr_l x8, CTES > + ld1 {copy0.4s}, [x8] > + /* copy1,copy2 = key */ > + ld1 { copy1.4s, copy2.4s }, [x1] > + /* copy3 = counter || zero nonce */ > + ldr copy3_d, [x2] > + > + adr_l x8, ONE > + ldr one_q, [x8] > + > + adr_l x10, ROT8 > + ld1 {rot8.4s}, [x10] These immediate loads are forcing the vDSO to have a .rodata section, which is best avoided, given that this is mapped into every user space program. Either use the existing mov_q macro and then move the values into SIMD registers, or compose the required vectors in a different way. E.g., with one_v == v16, movi one_v.2s, #1 uzp1 one_v.4s, one_v.4s, one_v.4s puts the correct value in one_d, uses 1 instruction and 16 bytes of rodata less, and avoids a memory access. The ROT8 + tbl can be replaced by shl/sri (see below) > +.Lblock: > + /* copy state to auxiliary vectors for the final add after the permute. */ > + mov state0.16b, copy0.16b > + mov state1.16b, copy1.16b > + mov state2.16b, copy2.16b > + mov state3.16b, copy3.16b > + > + mov w4, 20 > +.Lpermute: > + /* > + * Permute one 64-byte block where the state matrix is stored in the four NEON > + * registers state0-state3. It performs matrix operations on four words in parallel, > + * but requires shuffling to rearrange the words after each round. > + */ > + > +.Ldoubleround: > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + rev32 state3.8h, state3.8h > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #12 > + sri state1.4s, tmp.4s, #20 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + tbl state3.16b, {state3.16b}, rot8.16b > + This can be changed to the below, removing the need for the ROT8 vector eor tmp.16b, state3.16b, state0.16b shl state3.4s, tmp.4s, #8 sri state3.4s, tmp.4s, #24 > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #7 > + sri state1.4s, tmp.4s, #25 > + > + /* state1[0,1,2,3] = state1[1,2,3,0] */ > + ext state1.16b, state1.16b, state1.16b, #4 > + /* state2[0,1,2,3] = state2[2,3,0,1] */ > + ext state2.16b, state2.16b, state2.16b, #8 > + /* state3[0,1,2,3] = state3[1,2,3,0] */ > + ext state3.16b, state3.16b, state3.16b, #12 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + rev32 state3.8h, state3.8h > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #12 > + sri state1.4s, tmp.4s, #20 > + > + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ > + add state0.4s, state0.4s, state1.4s > + eor state3.16b, state3.16b, state0.16b > + tbl state3.16b, {state3.16b}, rot8.16b > + > + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ > + add state2.4s, state2.4s, state3.4s > + eor tmp.16b, state1.16b, state2.16b > + shl state1.4s, tmp.4s, #7 > + sri state1.4s, tmp.4s, #25 > + > + /* state1[0,1,2,3] = state1[3,0,1,2] */ > + ext state1.16b, state1.16b, state1.16b, #12 > + /* state2[0,1,2,3] = state2[2,3,0,1] */ > + ext state2.16b, state2.16b, state2.16b, #8 > + /* state3[0,1,2,3] = state3[1,2,3,0] */ > + ext state3.16b, state3.16b, state3.16b, #4 > + > + subs w4, w4, #2 > + b.ne .Ldoubleround > + > + /* output0 = state0 + state0 */ > + add state0.4s, state0.4s, copy0.4s > + /* output1 = state1 + state1 */ > + add state1.4s, state1.4s, copy1.4s > + /* output2 = state2 + state2 */ > + add state2.4s, state2.4s, copy2.4s > + /* output2 = state3 + state3 */ > + add state3.4s, state3.4s, copy3.4s > + st1 { state0.4s - state3.4s }, [x0] > + > + /* ++copy3.counter */ > + add copy3_d, copy3_d, one_d > + This 'add' clears the upper half of the SIMD register, which is where the zero nonce lives. So this happens to be correct, but it is not very intuitive, so perhaps a comment would be in order here. > + /* output += 64, --nblocks */ > + add x0, x0, 64 > + subs x3, x3, #1 > + b.ne .Lblock > + > + /* counter = copy3.counter */ > + str copy3_d, [x2] > + > + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ > + eor state0.16b, state0.16b, state0.16b > + eor state1.16b, state1.16b, state1.16b > + eor state2.16b, state2.16b, state2.16b > + eor state3.16b, state3.16b, state3.16b > + eor copy1.16b, copy1.16b, copy1.16b > + eor copy2.16b, copy2.16b, copy2.16b This is not x86 - no need to use XOR to clear registers, you can just use 'movi reg.16b, #0' here. > + ret > +SYM_FUNC_END(__arch_chacha20_blocks_nostack) > + > + .section ".rodata", "a", %progbits > + .align L1_CACHE_SHIFT > + > +CTES: .word 1634760805, 857760878, 2036477234, 1797285236 > +ONE: .xword 1, 0 > +ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f > + > +emit_aarch64_feature_1_and ...
On Fri, Aug 30, 2024 at 02:04:39PM +0200, Jason A. Donenfeld wrote: > > > +SYM_FUNC_START(__arch_chacha20_blocks_nostack) > > Is there any way we can reuse the existing code in > > crypto/chacha-neon-core.S for this? It looks to my untrained eye like > > this is an arbitrarily different implementation to what we already have. > Nope, it is indeed different, and not arbitrarily so. This patch is > mirroring exactly what we did on x86. It's probably worth some comments or something explaining what's going on with that (the commit log for the x86 patch mentions that it's that the vDSO needs a version that doesn't write to the stack).
On Thu, Aug 29, 2024 at 08:17:14PM +0000, Adhemerval Zanella wrote: > Hook up the generic vDSO implementation to the aarch64 vDSO data page. > The _vdso_rng_data required data is placed within the _vdso_data vvar > page, by using a offset larger than the vdso_data. This exposes some preexisting compiler warnings in the getrandom test when built with clang: vdso_test_getrandom.c:145:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] 145 | static void *test_vdso_getrandom(void *) | ^ vdso_test_getrandom.c:155:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] 155 | static void *test_libc_getrandom(void *) | ^ vdso_test_getrandom.c:165:43: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] 165 | static void *test_syscall_getrandom(void *) | ^ which it'd be good to get fixed before merging.
On Fri, Aug 30, 2024 at 04:19:00PM +0100, Mark Brown wrote: > On Thu, Aug 29, 2024 at 08:17:14PM +0000, Adhemerval Zanella wrote: > > Hook up the generic vDSO implementation to the aarch64 vDSO data page. > > The _vdso_rng_data required data is placed within the _vdso_data vvar > > page, by using a offset larger than the vdso_data. > > This exposes some preexisting compiler warnings in the getrandom test > when built with clang: > > vdso_test_getrandom.c:145:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] > 145 | static void *test_vdso_getrandom(void *) > | ^ > vdso_test_getrandom.c:155:40: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] > 155 | static void *test_libc_getrandom(void *) > | ^ > vdso_test_getrandom.c:165:43: warning: omitting the parameter name in a function definition is a C23 extension [-Wc23-extensions] > 165 | static void *test_syscall_getrandom(void *) > | ^ > > which it'd be good to get fixed before merging. That's my bug. I'll fix that up in the tree and CC you on it. Thanks for pointing it out. Jason
Hi Adhemerval,
kernel test robot noticed the following build errors:
[auto build test ERROR on crng-random/master]
[also build test ERROR on next-20240830]
[cannot apply to arm64/for-next/core shuah-kselftest/next shuah-kselftest/fixes linus/master v6.11-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adhemerval-Zanella/aarch64-vdso-Wire-up-getrandom-vDSO-implementation/20240830-041912
base: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git master
patch link: https://lore.kernel.org/r/20240829201728.2825-1-adhemerval.zanella%40linaro.org
patch subject: [PATCH v2] aarch64: vdso: Wire up getrandom() vDSO implementation
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20240831/202408310030.S5ZNwLWz-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 46fe36a4295f05d5d3731762e31fc4e6e99863e9)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310030.S5ZNwLWz-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408310030.S5ZNwLWz-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/arm64/kernel/asm-offsets.c:10:
In file included from include/linux/arm_sdei.h:8:
In file included from include/acpi/ghes.h:5:
In file included from include/acpi/apei.h:9:
In file included from include/linux/acpi.h:39:
In file included from include/acpi/acpi_io.h:7:
In file included from arch/arm64/include/asm/acpi.h:14:
In file included from include/linux/memblock.h:12:
In file included from include/linux/mm.h:2228:
include/linux/vmstat.h:503:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
503 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
504 | item];
| ~~~~
include/linux/vmstat.h:510:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
510 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
511 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
517 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:523:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
523 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
524 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.
In file included from <built-in>:4:
In file included from lib/vdso/getrandom.c:8:
In file included from include/linux/mm.h:2228:
include/linux/vmstat.h:503:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
503 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
504 | item];
| ~~~~
include/linux/vmstat.h:510:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
510 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
511 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
include/linux/vmstat.h:517:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
517 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
include/linux/vmstat.h:523:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
523 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
524 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
In file included from <built-in>:4:
In file included from lib/vdso/getrandom.c:12:
In file included from arch/arm64/include/asm/vdso/getrandom.h:8:
>> arch/arm64/include/asm/vdso.h:25:10: fatal error: 'generated/vdso-offsets.h' file not found
25 | #include <generated/vdso-offsets.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings and 1 error generated.
make[3]: *** [scripts/Makefile.build:244: arch/arm64/kernel/vdso/vgetrandom.o] Error 1
make[3]: Target 'include/generated/vdso-offsets.h' not remade because of errors.
make[3]: Target 'arch/arm64/kernel/vdso/vdso.so' not remade because of errors.
make[2]: *** [arch/arm64/Makefile:217: vdso_prepare] Error 2
make[2]: Target 'prepare' not remade because of errors.
make[1]: *** [Makefile:224: __sub-make] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:224: __sub-make] Error 2
make: Target 'prepare' not remade because of errors.
vim +25 arch/arm64/include/asm/vdso.h
0a7927d2b89e55 Adhemerval Zanella 2024-08-29 24
9031fefde6f2ac Will Deacon 2012-03-05 @25 #include <generated/vdso-offsets.h>
9031fefde6f2ac Will Deacon 2012-03-05 26
On 30/08/24 11:11, Ard Biesheuvel wrote: > On Thu, 29 Aug 2024 at 22:17, Adhemerval Zanella > <adhemerval.zanella@linaro.org> wrote: >> >> Hook up the generic vDSO implementation to the aarch64 vDSO data page. >> The _vdso_rng_data required data is placed within the _vdso_data vvar >> page, by using a offset larger than the vdso_data. >> >> The vDSO function requires a ChaCha20 implementation that does not >> write to the stack, and that can do an entire ChaCha20 permutation. >> The one provided is based on the current chacha-neon-core.S and uses NEON >> on the permute operation. The fallback for chips that do not support >> NEON issues the syscall. >> >> This also passes the vdso_test_chacha test along with >> vdso_test_getrandom. The vdso_test_getrandom bench-single result on >> Neoverse-N1 shows: >> >> vdso: 25000000 times in 0.746506464 seconds >> libc: 25000000 times in 8.849179444 seconds >> syscall: 25000000 times in 8.818726425 seconds >> >> Changes from v1: >> - Fixed style issues and typos. >> - Added fallback for systems without NEON support. >> - Avoid use of non-volatile vector registers in neon chacha20. >> - Use c-getrandom-y for vgetrandom.c. >> - Fixed TIMENS vdso_rnd_data access. >> >> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> >> --- > ... >> diff --git a/arch/arm64/kernel/vdso/vgetrandom-chacha.S b/arch/arm64/kernel/vdso/vgetrandom-chacha.S >> new file mode 100644 >> index 000000000000..9ebf12a09c65 >> --- /dev/null >> +++ b/arch/arm64/kernel/vdso/vgetrandom-chacha.S >> @@ -0,0 +1,168 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> + >> +#include <linux/linkage.h> >> +#include <asm/cache.h> >> +#include <asm/assembler.h> >> + >> + .text >> + >> +#define state0 v0 >> +#define state1 v1 >> +#define state2 v2 >> +#define state3 v3 >> +#define copy0 v4 >> +#define copy1 v5 >> +#define copy2 v6 >> +#define copy3 v7 >> +#define copy3_d d7 >> +#define one_d d16 >> +#define one_q q16 >> +#define tmp v17 >> +#define rot8 v18 >> + > > Please make a note somewhere around here that you are deliberately > avoiding d8-d15 because they are callee-save in user space. Ack. > >> +/* >> + * ARM64 ChaCha20 implementation meant for vDSO. Produces a given positive >> + * number of blocks of output with nonce 0, taking an input key and 8-bytes >> + * counter. Importantly does not spill to the stack. >> + * >> + * void __arch_chacha20_blocks_nostack(uint8_t *dst_bytes, >> + * const uint8_t *key, >> + * uint32_t *counter, >> + * size_t nblocks) >> + * >> + * x0: output bytes >> + * x1: 32-byte key input >> + * x2: 8-byte counter input/output >> + * x3: number of 64-byte block to write to output >> + */ >> +SYM_FUNC_START(__arch_chacha20_blocks_nostack) >> + >> + /* copy0 = "expand 32-byte k" */ >> + adr_l x8, CTES >> + ld1 {copy0.4s}, [x8] >> + /* copy1,copy2 = key */ >> + ld1 { copy1.4s, copy2.4s }, [x1] >> + /* copy3 = counter || zero nonce */ >> + ldr copy3_d, [x2] >> + >> + adr_l x8, ONE >> + ldr one_q, [x8] >> + >> + adr_l x10, ROT8 >> + ld1 {rot8.4s}, [x10] > > These immediate loads are forcing the vDSO to have a .rodata section, > which is best avoided, given that this is mapped into every user space > program. > > Either use the existing mov_q macro and then move the values into SIMD > registers, or compose the required vectors in a different way. Ack, mov_q seems suffice here. > > E.g., with one_v == v16, > > movi one_v.2s, #1 > uzp1 one_v.4s, one_v.4s, one_v.4s > > puts the correct value in one_d, uses 1 instruction and 16 bytes of > rodata less, and avoids a memory access. Ack. > > The ROT8 + tbl can be replaced by shl/sri (see below) > >> +.Lblock: >> + /* copy state to auxiliary vectors for the final add after the permute. */ >> + mov state0.16b, copy0.16b >> + mov state1.16b, copy1.16b >> + mov state2.16b, copy2.16b >> + mov state3.16b, copy3.16b >> + >> + mov w4, 20 >> +.Lpermute: >> + /* >> + * Permute one 64-byte block where the state matrix is stored in the four NEON >> + * registers state0-state3. It performs matrix operations on four words in parallel, >> + * but requires shuffling to rearrange the words after each round. >> + */ >> + >> +.Ldoubleround: >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + rev32 state3.8h, state3.8h >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #12 >> + sri state1.4s, tmp.4s, #20 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + tbl state3.16b, {state3.16b}, rot8.16b >> + > > This can be changed to the below, removing the need for the ROT8 vector > > eor tmp.16b, state3.16b, state0.16b > shl state3.4s, tmp.4s, #8 > sri state3.4s, tmp.4s, #24 > Ack. >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #7 >> + sri state1.4s, tmp.4s, #25 >> + >> + /* state1[0,1,2,3] = state1[1,2,3,0] */ >> + ext state1.16b, state1.16b, state1.16b, #4 >> + /* state2[0,1,2,3] = state2[2,3,0,1] */ >> + ext state2.16b, state2.16b, state2.16b, #8 >> + /* state3[0,1,2,3] = state3[1,2,3,0] */ >> + ext state3.16b, state3.16b, state3.16b, #12 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + rev32 state3.8h, state3.8h >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #12 >> + sri state1.4s, tmp.4s, #20 >> + >> + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ >> + add state0.4s, state0.4s, state1.4s >> + eor state3.16b, state3.16b, state0.16b >> + tbl state3.16b, {state3.16b}, rot8.16b >> + >> + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ >> + add state2.4s, state2.4s, state3.4s >> + eor tmp.16b, state1.16b, state2.16b >> + shl state1.4s, tmp.4s, #7 >> + sri state1.4s, tmp.4s, #25 >> + >> + /* state1[0,1,2,3] = state1[3,0,1,2] */ >> + ext state1.16b, state1.16b, state1.16b, #12 >> + /* state2[0,1,2,3] = state2[2,3,0,1] */ >> + ext state2.16b, state2.16b, state2.16b, #8 >> + /* state3[0,1,2,3] = state3[1,2,3,0] */ >> + ext state3.16b, state3.16b, state3.16b, #4 >> + >> + subs w4, w4, #2 >> + b.ne .Ldoubleround >> + >> + /* output0 = state0 + state0 */ >> + add state0.4s, state0.4s, copy0.4s >> + /* output1 = state1 + state1 */ >> + add state1.4s, state1.4s, copy1.4s >> + /* output2 = state2 + state2 */ >> + add state2.4s, state2.4s, copy2.4s >> + /* output2 = state3 + state3 */ >> + add state3.4s, state3.4s, copy3.4s >> + st1 { state0.4s - state3.4s }, [x0] >> + >> + /* ++copy3.counter */ >> + add copy3_d, copy3_d, one_d >> + > > This 'add' clears the upper half of the SIMD register, which is where > the zero nonce lives. So this happens to be correct, but it is not > very intuitive, so perhaps a comment would be in order here. Ack, will do. > >> + /* output += 64, --nblocks */ >> + add x0, x0, 64 >> + subs x3, x3, #1 >> + b.ne .Lblock >> + >> + /* counter = copy3.counter */ >> + str copy3_d, [x2] >> + >> + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ >> + eor state0.16b, state0.16b, state0.16b >> + eor state1.16b, state1.16b, state1.16b >> + eor state2.16b, state2.16b, state2.16b >> + eor state3.16b, state3.16b, state3.16b >> + eor copy1.16b, copy1.16b, copy1.16b >> + eor copy2.16b, copy2.16b, copy2.16b > > This is not x86 - no need to use XOR to clear registers, you can just > use 'movi reg.16b, #0' here. Ack. > >> + ret >> +SYM_FUNC_END(__arch_chacha20_blocks_nostack) >> + >> + .section ".rodata", "a", %progbits >> + .align L1_CACHE_SHIFT >> + >> +CTES: .word 1634760805, 857760878, 2036477234, 1797285236 >> +ONE: .xword 1, 0 >> +ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f >> + >> +emit_aarch64_feature_1_and > ...
Hi Adhemerval,
kernel test robot noticed the following build errors:
[auto build test ERROR on crng-random/master]
[also build test ERROR on next-20240830]
[cannot apply to arm64/for-next/core shuah-kselftest/next shuah-kselftest/fixes linus/master v6.11-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Adhemerval-Zanella/aarch64-vdso-Wire-up-getrandom-vDSO-implementation/20240830-041912
base: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git master
patch link: https://lore.kernel.org/r/20240829201728.2825-1-adhemerval.zanella%40linaro.org
patch subject: [PATCH v2] aarch64: vdso: Wire up getrandom() vDSO implementation
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20240831/202408310834.qh5oO1N6-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 13.3.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240831/202408310834.qh5oO1N6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408310834.qh5oO1N6-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/arm64/include/asm/vdso/getrandom.h:8,
from lib/vdso/getrandom.c:12,
from <command-line>:
>> arch/arm64/include/asm/vdso.h:25:10: fatal error: generated/vdso-offsets.h: No such file or directory
25 | #include <generated/vdso-offsets.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[3]: *** [scripts/Makefile.build:244: arch/arm64/kernel/vdso/vgetrandom.o] Error 1
make[3]: Target 'include/generated/vdso-offsets.h' not remade because of errors.
make[3]: Target 'arch/arm64/kernel/vdso/vdso.so' not remade because of errors.
make[2]: *** [arch/arm64/Makefile:217: vdso_prepare] Error 2
make[2]: Target 'prepare' not remade because of errors.
make[1]: *** [Makefile:224: __sub-make] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:224: __sub-make] Error 2
make: Target 'prepare' not remade because of errors.
vim +25 arch/arm64/include/asm/vdso.h
0a7927d2b89e55 Adhemerval Zanella 2024-08-29 24
9031fefde6f2ac Will Deacon 2012-03-05 @25 #include <generated/vdso-offsets.h>
9031fefde6f2ac Will Deacon 2012-03-05 26
Hey Christophe (for header logic) & Will (for arm64 stuff), On Fri, Aug 30, 2024 at 09:28:29AM -0300, Adhemerval Zanella Netto wrote: > >> diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c > >> index 938ca539aaa6..7c9711248d9b 100644 > >> --- a/lib/vdso/getrandom.c > >> +++ b/lib/vdso/getrandom.c > >> @@ -5,6 +5,7 @@ > >> > >> #include <linux/array_size.h> > >> #include <linux/minmax.h> > >> +#include <linux/mm.h> > >> #include <vdso/datapage.h> > >> #include <vdso/getrandom.h> > >> #include <vdso/unaligned.h> > > > > Looks like this should be a separate change? > > > It is required so arm64 can use c-getrandom-y, otherwise vgetrandom.o build > fails: > > CC arch/arm64/kernel/vdso/vgetrandom.o > In file included from ./include/uapi/linux/mman.h:5, > from /mnt/projects/linux/linux-git/lib/vdso/getrandom.c:13, > from <command-line>: > ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_prot_bits’: > ./arch/arm64/include/asm/mman.h:14:13: error: implicit declaration of function ‘system_supports_bti’ [-Werror=implicit-function-declaration] > 14 | if (system_supports_bti() && (prot & PROT_BTI)) > | ^~~~~~~~~~~~~~~~~~~ > ./arch/arm64/include/asm/mman.h:15:24: error: ‘VM_ARM64_BTI’ undeclared (first use in this function); did you mean ‘ARM64_BTI’? > 15 | ret |= VM_ARM64_BTI; > | ^~~~~~~~~~~~ > | ARM64_BTI > ./arch/arm64/include/asm/mman.h:15:24: note: each undeclared identifier is reported only once for each function it appears in > ./arch/arm64/include/asm/mman.h:17:13: error: implicit declaration of function ‘system_supports_mte’ [-Werror=implicit-function-declaration] > 17 | if (system_supports_mte() && (prot & PROT_MTE)) > | ^~~~~~~~~~~~~~~~~~~ > ./arch/arm64/include/asm/mman.h:18:24: error: ‘VM_MTE’ undeclared (first use in this function) > 18 | ret |= VM_MTE; > | ^~~~~~ > ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_flag_bits’: > ./arch/arm64/include/asm/mman.h:32:24: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) > 32 | return VM_MTE_ALLOWED; > | ^~~~~~~~~~~~~~ > ./arch/arm64/include/asm/mman.h: In function ‘arch_validate_flags’: > ./arch/arm64/include/asm/mman.h:59:29: error: ‘VM_MTE’ undeclared (first use in this function) > 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); > | ^~~~~~ > ./arch/arm64/include/asm/mman.h:59:52: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) > 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); > | ^~~~~~~~~~~~~~ > arch/arm64/kernel/vdso/vgetrandom.c: In function ‘__kernel_getrandom’: > arch/arm64/kernel/vdso/vgetrandom.c:18:25: error: ‘ENOSYS’ undeclared (first use in this function); did you mean ‘ENOSPC’? > 18 | return -ENOSYS; > | ^~~~~~ > | ENOSPC > cc1: some warnings being treated as errors > > I can move to a different patch, but this is really tied to this patch. Adhemerval kept this change in this patch for v3, which, if it's necessary, is fine with me. But I was looking to see if there was another way of doing it, because including linux/mm.h inside of vdso code is kind of contrary to your project with e379299fe0b3 ("random: vDSO: minimize and simplify header includes"). getrandom.c includes uapi/linux/mman.h for the mmap constants. That seems fine; it's userspace code after all. But then uapi/linux/mman.h has this: #include <asm/mman.h> #include <asm-generic/hugetlb_encode.h> #include <linux/types.h> The asm-generic/ one resolves to uapi/asm-generic. But the asm/ one resolves to arch code, which is where we then get in trouble on ARM, where arch/arm64/include/asm/mman.h has all sorts of kernel code in it. Maybe, instead, it should resolve to arch/arm64/include/uapi/asm/mman.h, which is the header that userspace actually uses in normal user code? Is this a makefile problem? What's going on here? Seems like this is something worth sorting out. Or I can take Adhemerval's v3 as-is and we'll grit our teeth and work it out later, as you prefer. But I thought I should mention it. Thoughts? Jason
Le 02/09/2024 à 15:11, Jason A. Donenfeld a écrit : > Hey Christophe (for header logic) & Will (for arm64 stuff), > > On Fri, Aug 30, 2024 at 09:28:29AM -0300, Adhemerval Zanella Netto wrote: >>>> diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c >>>> index 938ca539aaa6..7c9711248d9b 100644 >>>> --- a/lib/vdso/getrandom.c >>>> +++ b/lib/vdso/getrandom.c >>>> @@ -5,6 +5,7 @@ >>>> >>>> #include <linux/array_size.h> >>>> #include <linux/minmax.h> >>>> +#include <linux/mm.h> >>>> #include <vdso/datapage.h> >>>> #include <vdso/getrandom.h> >>>> #include <vdso/unaligned.h> >>> >>> Looks like this should be a separate change? >> >> >> It is required so arm64 can use c-getrandom-y, otherwise vgetrandom.o build >> fails: >> >> CC arch/arm64/kernel/vdso/vgetrandom.o >> In file included from ./include/uapi/linux/mman.h:5, >> from /mnt/projects/linux/linux-git/lib/vdso/getrandom.c:13, >> from <command-line>: >> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_prot_bits’: >> ./arch/arm64/include/asm/mman.h:14:13: error: implicit declaration of function ‘system_supports_bti’ [-Werror=implicit-function-declaration] >> 14 | if (system_supports_bti() && (prot & PROT_BTI)) >> | ^~~~~~~~~~~~~~~~~~~ >> ./arch/arm64/include/asm/mman.h:15:24: error: ‘VM_ARM64_BTI’ undeclared (first use in this function); did you mean ‘ARM64_BTI’? >> 15 | ret |= VM_ARM64_BTI; >> | ^~~~~~~~~~~~ >> | ARM64_BTI >> ./arch/arm64/include/asm/mman.h:15:24: note: each undeclared identifier is reported only once for each function it appears in >> ./arch/arm64/include/asm/mman.h:17:13: error: implicit declaration of function ‘system_supports_mte’ [-Werror=implicit-function-declaration] >> 17 | if (system_supports_mte() && (prot & PROT_MTE)) >> | ^~~~~~~~~~~~~~~~~~~ >> ./arch/arm64/include/asm/mman.h:18:24: error: ‘VM_MTE’ undeclared (first use in this function) >> 18 | ret |= VM_MTE; >> | ^~~~~~ >> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_flag_bits’: >> ./arch/arm64/include/asm/mman.h:32:24: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) >> 32 | return VM_MTE_ALLOWED; >> | ^~~~~~~~~~~~~~ >> ./arch/arm64/include/asm/mman.h: In function ‘arch_validate_flags’: >> ./arch/arm64/include/asm/mman.h:59:29: error: ‘VM_MTE’ undeclared (first use in this function) >> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); >> | ^~~~~~ >> ./arch/arm64/include/asm/mman.h:59:52: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) >> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); >> | ^~~~~~~~~~~~~~ >> arch/arm64/kernel/vdso/vgetrandom.c: In function ‘__kernel_getrandom’: >> arch/arm64/kernel/vdso/vgetrandom.c:18:25: error: ‘ENOSYS’ undeclared (first use in this function); did you mean ‘ENOSPC’? >> 18 | return -ENOSYS; >> | ^~~~~~ >> | ENOSPC >> cc1: some warnings being treated as errors >> >> I can move to a different patch, but this is really tied to this patch. > > Adhemerval kept this change in this patch for v3, which, if it's > necessary, is fine with me. But I was looking to see if there was > another way of doing it, because including linux/mm.h inside of vdso > code is kind of contrary to your project with e379299fe0b3 ("random: > vDSO: minimize and simplify header includes"). > > getrandom.c includes uapi/linux/mman.h for the mmap constants. That > seems fine; it's userspace code after all. But then uapi/linux/mman.h > has this: > > #include <asm/mman.h> > #include <asm-generic/hugetlb_encode.h> > #include <linux/types.h> > > The asm-generic/ one resolves to uapi/asm-generic. But the asm/ one > resolves to arch code, which is where we then get in trouble on ARM, > where arch/arm64/include/asm/mman.h has all sorts of kernel code in it. > > Maybe, instead, it should resolve to arch/arm64/include/uapi/asm/mman.h, > which is the header that userspace actually uses in normal user code? > > Is this a makefile problem? What's going on here? Seems like this is > something worth sorting out. Or I can take Adhemerval's v3 as-is and > we'll grit our teeth and work it out later, as you prefer. But I thought > I should mention it. That's a tricky problem, I also have it on powerpc, see patch 5, I solved it that way: In the Makefile: -ccflags-y := -fno-common -fno-builtin +ccflags-y := -fno-common -fno-builtin -DBUILD_VDSO In arch/powerpc/include/asm/mman.h: diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h index 17a77d47ed6d..42a51a993d94 100644 --- a/arch/powerpc/include/asm/mman.h +++ b/arch/powerpc/include/asm/mman.h @@ -6,7 +6,7 @@ #include <uapi/asm/mman.h> -#ifdef CONFIG_PPC64 +#if defined(CONFIG_PPC64) && !defined(BUILD_VDSO) #include <asm/cputable.h> #include <linux/mm.h> So that the only thing that remains in arch/powerpc/include/asm/mman.h when building a VDSO is #include <uapi/asm/mman.h> I got the idea from ARM64, they use something similar in their arch/arm64/include/asm/rwonce.h Christophe
On Mon, Sep 02, 2024 at 03:19:56PM +0200, Christophe Leroy wrote: > > > Le 02/09/2024 à 15:11, Jason A. Donenfeld a écrit : > > Hey Christophe (for header logic) & Will (for arm64 stuff), > > > > On Fri, Aug 30, 2024 at 09:28:29AM -0300, Adhemerval Zanella Netto wrote: > >>>> diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c > >>>> index 938ca539aaa6..7c9711248d9b 100644 > >>>> --- a/lib/vdso/getrandom.c > >>>> +++ b/lib/vdso/getrandom.c > >>>> @@ -5,6 +5,7 @@ > >>>> > >>>> #include <linux/array_size.h> > >>>> #include <linux/minmax.h> > >>>> +#include <linux/mm.h> > >>>> #include <vdso/datapage.h> > >>>> #include <vdso/getrandom.h> > >>>> #include <vdso/unaligned.h> > >>> > >>> Looks like this should be a separate change? > >> > >> > >> It is required so arm64 can use c-getrandom-y, otherwise vgetrandom.o build > >> fails: > >> > >> CC arch/arm64/kernel/vdso/vgetrandom.o > >> In file included from ./include/uapi/linux/mman.h:5, > >> from /mnt/projects/linux/linux-git/lib/vdso/getrandom.c:13, > >> from <command-line>: > >> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_prot_bits’: > >> ./arch/arm64/include/asm/mman.h:14:13: error: implicit declaration of function ‘system_supports_bti’ [-Werror=implicit-function-declaration] > >> 14 | if (system_supports_bti() && (prot & PROT_BTI)) > >> | ^~~~~~~~~~~~~~~~~~~ > >> ./arch/arm64/include/asm/mman.h:15:24: error: ‘VM_ARM64_BTI’ undeclared (first use in this function); did you mean ‘ARM64_BTI’? > >> 15 | ret |= VM_ARM64_BTI; > >> | ^~~~~~~~~~~~ > >> | ARM64_BTI > >> ./arch/arm64/include/asm/mman.h:15:24: note: each undeclared identifier is reported only once for each function it appears in > >> ./arch/arm64/include/asm/mman.h:17:13: error: implicit declaration of function ‘system_supports_mte’ [-Werror=implicit-function-declaration] > >> 17 | if (system_supports_mte() && (prot & PROT_MTE)) > >> | ^~~~~~~~~~~~~~~~~~~ > >> ./arch/arm64/include/asm/mman.h:18:24: error: ‘VM_MTE’ undeclared (first use in this function) > >> 18 | ret |= VM_MTE; > >> | ^~~~~~ > >> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_flag_bits’: > >> ./arch/arm64/include/asm/mman.h:32:24: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) > >> 32 | return VM_MTE_ALLOWED; > >> | ^~~~~~~~~~~~~~ > >> ./arch/arm64/include/asm/mman.h: In function ‘arch_validate_flags’: > >> ./arch/arm64/include/asm/mman.h:59:29: error: ‘VM_MTE’ undeclared (first use in this function) > >> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); > >> | ^~~~~~ > >> ./arch/arm64/include/asm/mman.h:59:52: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) > >> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); > >> | ^~~~~~~~~~~~~~ > >> arch/arm64/kernel/vdso/vgetrandom.c: In function ‘__kernel_getrandom’: > >> arch/arm64/kernel/vdso/vgetrandom.c:18:25: error: ‘ENOSYS’ undeclared (first use in this function); did you mean ‘ENOSPC’? > >> 18 | return -ENOSYS; > >> | ^~~~~~ > >> | ENOSPC > >> cc1: some warnings being treated as errors > >> > >> I can move to a different patch, but this is really tied to this patch. > > > > Adhemerval kept this change in this patch for v3, which, if it's > > necessary, is fine with me. But I was looking to see if there was > > another way of doing it, because including linux/mm.h inside of vdso > > code is kind of contrary to your project with e379299fe0b3 ("random: > > vDSO: minimize and simplify header includes"). > > > > getrandom.c includes uapi/linux/mman.h for the mmap constants. That > > seems fine; it's userspace code after all. But then uapi/linux/mman.h > > has this: > > > > #include <asm/mman.h> > > #include <asm-generic/hugetlb_encode.h> > > #include <linux/types.h> > > > > The asm-generic/ one resolves to uapi/asm-generic. But the asm/ one > > resolves to arch code, which is where we then get in trouble on ARM, > > where arch/arm64/include/asm/mman.h has all sorts of kernel code in it. > > > > Maybe, instead, it should resolve to arch/arm64/include/uapi/asm/mman.h, > > which is the header that userspace actually uses in normal user code? > > > > Is this a makefile problem? What's going on here? Seems like this is > > something worth sorting out. Or I can take Adhemerval's v3 as-is and > > we'll grit our teeth and work it out later, as you prefer. But I thought > > I should mention it. > > That's a tricky problem, I also have it on powerpc, see patch 5, I > solved it that way: > > In the Makefile: > -ccflags-y := -fno-common -fno-builtin > +ccflags-y := -fno-common -fno-builtin -DBUILD_VDSO > > In arch/powerpc/include/asm/mman.h: > > diff --git a/arch/powerpc/include/asm/mman.h > b/arch/powerpc/include/asm/mman.h > index 17a77d47ed6d..42a51a993d94 100644 > --- a/arch/powerpc/include/asm/mman.h > +++ b/arch/powerpc/include/asm/mman.h > @@ -6,7 +6,7 @@ > > #include <uapi/asm/mman.h> > > -#ifdef CONFIG_PPC64 > +#if defined(CONFIG_PPC64) && !defined(BUILD_VDSO) > > #include <asm/cputable.h> > #include <linux/mm.h> > > So that the only thing that remains in arch/powerpc/include/asm/mman.h > when building a VDSO is #include <uapi/asm/mman.h> > > I got the idea from ARM64, they use something similar in their > arch/arm64/include/asm/rwonce.h That seems reasonable enough. Adhemerval - do you want to incorporate this solution for your v+1? And Will, is it okay to keep that as one patch, as Christophe has done, rather than splitting it, so the whole change is hermetic? Jason
On 02/09/24 10:25, Jason A. Donenfeld wrote: > On Mon, Sep 02, 2024 at 03:19:56PM +0200, Christophe Leroy wrote: >> >> >> Le 02/09/2024 à 15:11, Jason A. Donenfeld a écrit : >>> Hey Christophe (for header logic) & Will (for arm64 stuff), >>> >>> On Fri, Aug 30, 2024 at 09:28:29AM -0300, Adhemerval Zanella Netto wrote: >>>>>> diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c >>>>>> index 938ca539aaa6..7c9711248d9b 100644 >>>>>> --- a/lib/vdso/getrandom.c >>>>>> +++ b/lib/vdso/getrandom.c >>>>>> @@ -5,6 +5,7 @@ >>>>>> >>>>>> #include <linux/array_size.h> >>>>>> #include <linux/minmax.h> >>>>>> +#include <linux/mm.h> >>>>>> #include <vdso/datapage.h> >>>>>> #include <vdso/getrandom.h> >>>>>> #include <vdso/unaligned.h> >>>>> >>>>> Looks like this should be a separate change? >>>> >>>> >>>> It is required so arm64 can use c-getrandom-y, otherwise vgetrandom.o build >>>> fails: >>>> >>>> CC arch/arm64/kernel/vdso/vgetrandom.o >>>> In file included from ./include/uapi/linux/mman.h:5, >>>> from /mnt/projects/linux/linux-git/lib/vdso/getrandom.c:13, >>>> from <command-line>: >>>> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_prot_bits’: >>>> ./arch/arm64/include/asm/mman.h:14:13: error: implicit declaration of function ‘system_supports_bti’ [-Werror=implicit-function-declaration] >>>> 14 | if (system_supports_bti() && (prot & PROT_BTI)) >>>> | ^~~~~~~~~~~~~~~~~~~ >>>> ./arch/arm64/include/asm/mman.h:15:24: error: ‘VM_ARM64_BTI’ undeclared (first use in this function); did you mean ‘ARM64_BTI’? >>>> 15 | ret |= VM_ARM64_BTI; >>>> | ^~~~~~~~~~~~ >>>> | ARM64_BTI >>>> ./arch/arm64/include/asm/mman.h:15:24: note: each undeclared identifier is reported only once for each function it appears in >>>> ./arch/arm64/include/asm/mman.h:17:13: error: implicit declaration of function ‘system_supports_mte’ [-Werror=implicit-function-declaration] >>>> 17 | if (system_supports_mte() && (prot & PROT_MTE)) >>>> | ^~~~~~~~~~~~~~~~~~~ >>>> ./arch/arm64/include/asm/mman.h:18:24: error: ‘VM_MTE’ undeclared (first use in this function) >>>> 18 | ret |= VM_MTE; >>>> | ^~~~~~ >>>> ./arch/arm64/include/asm/mman.h: In function ‘arch_calc_vm_flag_bits’: >>>> ./arch/arm64/include/asm/mman.h:32:24: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) >>>> 32 | return VM_MTE_ALLOWED; >>>> | ^~~~~~~~~~~~~~ >>>> ./arch/arm64/include/asm/mman.h: In function ‘arch_validate_flags’: >>>> ./arch/arm64/include/asm/mman.h:59:29: error: ‘VM_MTE’ undeclared (first use in this function) >>>> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); >>>> | ^~~~~~ >>>> ./arch/arm64/include/asm/mman.h:59:52: error: ‘VM_MTE_ALLOWED’ undeclared (first use in this function) >>>> 59 | return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED); >>>> | ^~~~~~~~~~~~~~ >>>> arch/arm64/kernel/vdso/vgetrandom.c: In function ‘__kernel_getrandom’: >>>> arch/arm64/kernel/vdso/vgetrandom.c:18:25: error: ‘ENOSYS’ undeclared (first use in this function); did you mean ‘ENOSPC’? >>>> 18 | return -ENOSYS; >>>> | ^~~~~~ >>>> | ENOSPC >>>> cc1: some warnings being treated as errors >>>> >>>> I can move to a different patch, but this is really tied to this patch. >>> >>> Adhemerval kept this change in this patch for v3, which, if it's >>> necessary, is fine with me. But I was looking to see if there was >>> another way of doing it, because including linux/mm.h inside of vdso >>> code is kind of contrary to your project with e379299fe0b3 ("random: >>> vDSO: minimize and simplify header includes"). >>> >>> getrandom.c includes uapi/linux/mman.h for the mmap constants. That >>> seems fine; it's userspace code after all. But then uapi/linux/mman.h >>> has this: >>> >>> #include <asm/mman.h> >>> #include <asm-generic/hugetlb_encode.h> >>> #include <linux/types.h> >>> >>> The asm-generic/ one resolves to uapi/asm-generic. But the asm/ one >>> resolves to arch code, which is where we then get in trouble on ARM, >>> where arch/arm64/include/asm/mman.h has all sorts of kernel code in it. >>> >>> Maybe, instead, it should resolve to arch/arm64/include/uapi/asm/mman.h, >>> which is the header that userspace actually uses in normal user code? >>> >>> Is this a makefile problem? What's going on here? Seems like this is >>> something worth sorting out. Or I can take Adhemerval's v3 as-is and >>> we'll grit our teeth and work it out later, as you prefer. But I thought >>> I should mention it. >> >> That's a tricky problem, I also have it on powerpc, see patch 5, I >> solved it that way: >> >> In the Makefile: >> -ccflags-y := -fno-common -fno-builtin >> +ccflags-y := -fno-common -fno-builtin -DBUILD_VDSO >> >> In arch/powerpc/include/asm/mman.h: >> >> diff --git a/arch/powerpc/include/asm/mman.h >> b/arch/powerpc/include/asm/mman.h >> index 17a77d47ed6d..42a51a993d94 100644 >> --- a/arch/powerpc/include/asm/mman.h >> +++ b/arch/powerpc/include/asm/mman.h >> @@ -6,7 +6,7 @@ >> >> #include <uapi/asm/mman.h> >> >> -#ifdef CONFIG_PPC64 >> +#if defined(CONFIG_PPC64) && !defined(BUILD_VDSO) >> >> #include <asm/cputable.h> >> #include <linux/mm.h> >> >> So that the only thing that remains in arch/powerpc/include/asm/mman.h >> when building a VDSO is #include <uapi/asm/mman.h> >> >> I got the idea from ARM64, they use something similar in their >> arch/arm64/include/asm/rwonce.h > > That seems reasonable enough. Adhemerval - do you want to incorporate > this solution for your v+1? And Will, is it okay to keep that as one > patch, as Christophe has done, rather than splitting it, so the whole > change is hermetic? Sure, I will do it for v4.
On Mon, Sep 02, 2024 at 03:25:34PM +0200, Jason A. Donenfeld wrote: > On Mon, Sep 02, 2024 at 03:19:56PM +0200, Christophe Leroy wrote: > > diff --git a/arch/powerpc/include/asm/mman.h > > b/arch/powerpc/include/asm/mman.h > > index 17a77d47ed6d..42a51a993d94 100644 > > --- a/arch/powerpc/include/asm/mman.h > > +++ b/arch/powerpc/include/asm/mman.h > > @@ -6,7 +6,7 @@ > > > > #include <uapi/asm/mman.h> > > > > -#ifdef CONFIG_PPC64 > > +#if defined(CONFIG_PPC64) && !defined(BUILD_VDSO) > > > > #include <asm/cputable.h> > > #include <linux/mm.h> > > > > So that the only thing that remains in arch/powerpc/include/asm/mman.h > > when building a VDSO is #include <uapi/asm/mman.h> > > > > I got the idea from ARM64, they use something similar in their > > arch/arm64/include/asm/rwonce.h > > That seems reasonable enough. Adhemerval - do you want to incorporate > this solution for your v+1? And Will, is it okay to keep that as one > patch, as Christophe has done, rather than splitting it, so the whole > change is hermetic? Yup, that makes sense to me (and the lib/vdso/getrandom.c change would go away entirely). Will
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a2f8ff354ca6..7f7424d1b3b8 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -262,6 +262,7 @@ config ARM64 select TRACE_IRQFLAGS_NMI_SUPPORT select HAVE_SOFTIRQ_ON_OWN_STACK select USER_STACKTRACE_SUPPORT + select VDSO_GETRANDOM help ARM 64-bit (AArch64) Linux support. diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h index 4305995c8f82..18407b757c95 100644 --- a/arch/arm64/include/asm/vdso.h +++ b/arch/arm64/include/asm/vdso.h @@ -16,6 +16,12 @@ #ifndef __ASSEMBLY__ +enum vvar_pages { + VVAR_DATA_PAGE_OFFSET, + VVAR_TIMENS_PAGE_OFFSET, + VVAR_NR_PAGES, +}; + #include <generated/vdso-offsets.h> #define VDSO_SYMBOL(base, name) \ diff --git a/arch/arm64/include/asm/vdso/getrandom.h b/arch/arm64/include/asm/vdso/getrandom.h new file mode 100644 index 000000000000..fca66ba49d4c --- /dev/null +++ b/arch/arm64/include/asm/vdso/getrandom.h @@ -0,0 +1,49 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ASM_VDSO_GETRANDOM_H +#define __ASM_VDSO_GETRANDOM_H + +#ifndef __ASSEMBLY__ + +#include <asm/vdso.h> +#include <asm/unistd.h> +#include <vdso/datapage.h> + +/** + * getrandom_syscall - Invoke the getrandom() syscall. + * @buffer: Destination buffer to fill with random bytes. + * @len: Size of @buffer in bytes. + * @flags: Zero or more GRND_* flags. + * Returns: The number of random bytes written to @buffer, or a negative value indicating an error. + */ +static __always_inline ssize_t getrandom_syscall(void *_buffer, size_t _len, unsigned int _flags) +{ + register void *buffer asm ("x0") = _buffer; + register size_t len asm ("x1") = _len; + register unsigned int flags asm ("x2") = _flags; + register long ret asm ("x0"); + register long nr asm ("x8") = __NR_getrandom; + + asm volatile( + " svc #0\n" + : "=r" (ret) + : "r" (buffer), "r" (len), "r" (flags), "r" (nr) + : "memory"); + + return ret; +} + +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void) +{ + /* + * If a task belongs to a time namespace then a namespace the real + * VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET. + */ + if (IS_ENABLED(CONFIG_TIME_NS) && _vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS) + return (void*)&_vdso_rng_data + VVAR_TIMENS_PAGE_OFFSET * PAGE_SIZE; + return &_vdso_rng_data; +} + +#endif /* !__ASSEMBLY__ */ + +#endif /* __ASM_VDSO_GETRANDOM_H */ diff --git a/arch/arm64/include/asm/vdso/vsyscall.h b/arch/arm64/include/asm/vdso/vsyscall.h index f94b1457c117..2a87f0e1b144 100644 --- a/arch/arm64/include/asm/vdso/vsyscall.h +++ b/arch/arm64/include/asm/vdso/vsyscall.h @@ -2,8 +2,11 @@ #ifndef __ASM_VDSO_VSYSCALL_H #define __ASM_VDSO_VSYSCALL_H +#define __VDSO_RND_DATA_OFFSET 480 + #ifndef __ASSEMBLY__ +#include <asm/vdso.h> #include <linux/timekeeper_internal.h> #include <vdso/datapage.h> @@ -21,6 +24,13 @@ struct vdso_data *__arm64_get_k_vdso_data(void) } #define __arch_get_k_vdso_data __arm64_get_k_vdso_data +static __always_inline +struct vdso_rng_data *__arm64_get_k_vdso_rnd_data(void) +{ + return (void*)vdso_data + __VDSO_RND_DATA_OFFSET; +} +#define __arch_get_k_vdso_rng_data __arm64_get_k_vdso_rnd_data + static __always_inline void __arm64_update_vsyscall(struct vdso_data *vdata, struct timekeeper *tk) { diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c index 89b6e7840002..706c9c3a7a50 100644 --- a/arch/arm64/kernel/vdso.c +++ b/arch/arm64/kernel/vdso.c @@ -34,12 +34,6 @@ enum vdso_abi { VDSO_ABI_AA32, }; -enum vvar_pages { - VVAR_DATA_PAGE_OFFSET, - VVAR_TIMENS_PAGE_OFFSET, - VVAR_NR_PAGES, -}; - struct vdso_abi_info { const char *name; const char *vdso_code_start; diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile index d11da6461278..50246a38d6bd 100644 --- a/arch/arm64/kernel/vdso/Makefile +++ b/arch/arm64/kernel/vdso/Makefile @@ -9,7 +9,7 @@ # Include the generic Makefile to check the built vdso. include $(srctree)/lib/vdso/Makefile -obj-vdso := vgettimeofday.o note.o sigreturn.o +obj-vdso := vgettimeofday.o note.o sigreturn.o vgetrandom.o vgetrandom-chacha.o # Build rules targets := $(obj-vdso) vdso.so vdso.so.dbg @@ -40,13 +40,22 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ -Wmissing-prototypes -Wmissing-declarations +CFLAGS_REMOVE_vgetrandom.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) \ + $(RANDSTRUCT_CFLAGS) $(GCC_PLUGINS_CFLAGS) \ + $(CC_FLAGS_LTO) $(CC_FLAGS_CFI) \ + -Wmissing-prototypes -Wmissing-declarations CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables +CFLAGS_vgetrandom.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables ifneq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y) endif +ifneq ($(c-getrandom-y),) + CFLAGS_vgetrandom.o += -include $(c-getrandom-y) +endif + targets += vdso.lds CPPFLAGS_vdso.lds += -P -C -U$(ARCH) diff --git a/arch/arm64/kernel/vdso/vdso b/arch/arm64/kernel/vdso/vdso new file mode 120000 index 000000000000..233c7a26f6e5 --- /dev/null +++ b/arch/arm64/kernel/vdso/vdso @@ -0,0 +1 @@ +../../../arch/arm64/kernel/vdso \ No newline at end of file diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S index 45354f2ddf70..f204a9ddc833 100644 --- a/arch/arm64/kernel/vdso/vdso.lds.S +++ b/arch/arm64/kernel/vdso/vdso.lds.S @@ -11,7 +11,9 @@ #include <linux/const.h> #include <asm/page.h> #include <asm/vdso.h> +#include <asm/vdso/vsyscall.h> #include <asm-generic/vmlinux.lds.h> +#include <vdso/datapage.h> OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64") OUTPUT_ARCH(aarch64) @@ -19,6 +21,7 @@ OUTPUT_ARCH(aarch64) SECTIONS { PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE); + PROVIDE(_vdso_rng_data = _vdso_data + __VDSO_RND_DATA_OFFSET); #ifdef CONFIG_TIME_NS PROVIDE(_timens_data = _vdso_data + PAGE_SIZE); #endif @@ -102,6 +105,7 @@ VERSION __kernel_gettimeofday; __kernel_clock_gettime; __kernel_clock_getres; + __kernel_getrandom; local: *; }; } diff --git a/arch/arm64/kernel/vdso/vgetrandom-chacha.S b/arch/arm64/kernel/vdso/vgetrandom-chacha.S new file mode 100644 index 000000000000..9ebf12a09c65 --- /dev/null +++ b/arch/arm64/kernel/vdso/vgetrandom-chacha.S @@ -0,0 +1,168 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/linkage.h> +#include <asm/cache.h> +#include <asm/assembler.h> + + .text + +#define state0 v0 +#define state1 v1 +#define state2 v2 +#define state3 v3 +#define copy0 v4 +#define copy1 v5 +#define copy2 v6 +#define copy3 v7 +#define copy3_d d7 +#define one_d d16 +#define one_q q16 +#define tmp v17 +#define rot8 v18 + +/* + * ARM64 ChaCha20 implementation meant for vDSO. Produces a given positive + * number of blocks of output with nonce 0, taking an input key and 8-bytes + * counter. Importantly does not spill to the stack. + * + * void __arch_chacha20_blocks_nostack(uint8_t *dst_bytes, + * const uint8_t *key, + * uint32_t *counter, + * size_t nblocks) + * + * x0: output bytes + * x1: 32-byte key input + * x2: 8-byte counter input/output + * x3: number of 64-byte block to write to output + */ +SYM_FUNC_START(__arch_chacha20_blocks_nostack) + + /* copy0 = "expand 32-byte k" */ + adr_l x8, CTES + ld1 {copy0.4s}, [x8] + /* copy1,copy2 = key */ + ld1 { copy1.4s, copy2.4s }, [x1] + /* copy3 = counter || zero nonce */ + ldr copy3_d, [x2] + + adr_l x8, ONE + ldr one_q, [x8] + + adr_l x10, ROT8 + ld1 {rot8.4s}, [x10] +.Lblock: + /* copy state to auxiliary vectors for the final add after the permute. */ + mov state0.16b, copy0.16b + mov state1.16b, copy1.16b + mov state2.16b, copy2.16b + mov state3.16b, copy3.16b + + mov w4, 20 +.Lpermute: + /* + * Permute one 64-byte block where the state matrix is stored in the four NEON + * registers state0-state3. It performs matrix operations on four words in parallel, + * but requires shuffling to rearrange the words after each round. + */ + +.Ldoubleround: + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + add state0.4s, state0.4s, state1.4s + eor state3.16b, state3.16b, state0.16b + rev32 state3.8h, state3.8h + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + add state2.4s, state2.4s, state3.4s + eor tmp.16b, state1.16b, state2.16b + shl state1.4s, tmp.4s, #12 + sri state1.4s, tmp.4s, #20 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + add state0.4s, state0.4s, state1.4s + eor state3.16b, state3.16b, state0.16b + tbl state3.16b, {state3.16b}, rot8.16b + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + add state2.4s, state2.4s, state3.4s + eor tmp.16b, state1.16b, state2.16b + shl state1.4s, tmp.4s, #7 + sri state1.4s, tmp.4s, #25 + + /* state1[0,1,2,3] = state1[1,2,3,0] */ + ext state1.16b, state1.16b, state1.16b, #4 + /* state2[0,1,2,3] = state2[2,3,0,1] */ + ext state2.16b, state2.16b, state2.16b, #8 + /* state3[0,1,2,3] = state3[1,2,3,0] */ + ext state3.16b, state3.16b, state3.16b, #12 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */ + add state0.4s, state0.4s, state1.4s + eor state3.16b, state3.16b, state0.16b + rev32 state3.8h, state3.8h + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */ + add state2.4s, state2.4s, state3.4s + eor tmp.16b, state1.16b, state2.16b + shl state1.4s, tmp.4s, #12 + sri state1.4s, tmp.4s, #20 + + /* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */ + add state0.4s, state0.4s, state1.4s + eor state3.16b, state3.16b, state0.16b + tbl state3.16b, {state3.16b}, rot8.16b + + /* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */ + add state2.4s, state2.4s, state3.4s + eor tmp.16b, state1.16b, state2.16b + shl state1.4s, tmp.4s, #7 + sri state1.4s, tmp.4s, #25 + + /* state1[0,1,2,3] = state1[3,0,1,2] */ + ext state1.16b, state1.16b, state1.16b, #12 + /* state2[0,1,2,3] = state2[2,3,0,1] */ + ext state2.16b, state2.16b, state2.16b, #8 + /* state3[0,1,2,3] = state3[1,2,3,0] */ + ext state3.16b, state3.16b, state3.16b, #4 + + subs w4, w4, #2 + b.ne .Ldoubleround + + /* output0 = state0 + state0 */ + add state0.4s, state0.4s, copy0.4s + /* output1 = state1 + state1 */ + add state1.4s, state1.4s, copy1.4s + /* output2 = state2 + state2 */ + add state2.4s, state2.4s, copy2.4s + /* output2 = state3 + state3 */ + add state3.4s, state3.4s, copy3.4s + st1 { state0.4s - state3.4s }, [x0] + + /* ++copy3.counter */ + add copy3_d, copy3_d, one_d + + /* output += 64, --nblocks */ + add x0, x0, 64 + subs x3, x3, #1 + b.ne .Lblock + + /* counter = copy3.counter */ + str copy3_d, [x2] + + /* Zero out the potentially sensitive regs, in case nothing uses these again. */ + eor state0.16b, state0.16b, state0.16b + eor state1.16b, state1.16b, state1.16b + eor state2.16b, state2.16b, state2.16b + eor state3.16b, state3.16b, state3.16b + eor copy1.16b, copy1.16b, copy1.16b + eor copy2.16b, copy2.16b, copy2.16b + ret +SYM_FUNC_END(__arch_chacha20_blocks_nostack) + + .section ".rodata", "a", %progbits + .align L1_CACHE_SHIFT + +CTES: .word 1634760805, 857760878, 2036477234, 1797285236 +ONE: .xword 1, 0 +ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f + +emit_aarch64_feature_1_and diff --git a/arch/arm64/kernel/vdso/vgetrandom.c b/arch/arm64/kernel/vdso/vgetrandom.c new file mode 100644 index 000000000000..0833d25f3121 --- /dev/null +++ b/arch/arm64/kernel/vdso/vgetrandom.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +typeof(__cvdso_getrandom) __kernel_getrandom; + +ssize_t __kernel_getrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len) +{ + asm goto ( + ALTERNATIVE("b %[fallback]", "nop", RM64_HAS_FPSIMD) : : : : fallback); + return __cvdso_getrandom(buffer, len, flags, opaque_state, opaque_len); + +fallback: + if (unlikely(opaque_len == ~0UL && !buffer && !len && !flags)) + return -ENOSYS; + return getrandom_syscall(buffer, len, flags); +} diff --git a/lib/vdso/getrandom.c b/lib/vdso/getrandom.c index 938ca539aaa6..7c9711248d9b 100644 --- a/lib/vdso/getrandom.c +++ b/lib/vdso/getrandom.c @@ -5,6 +5,7 @@ #include <linux/array_size.h> #include <linux/minmax.h> +#include <linux/mm.h> #include <vdso/datapage.h> #include <vdso/getrandom.h> #include <vdso/unaligned.h> diff --git a/tools/arch/arm64/vdso b/tools/arch/arm64/vdso new file mode 120000 index 000000000000..233c7a26f6e5 --- /dev/null +++ b/tools/arch/arm64/vdso @@ -0,0 +1 @@ +../../../arch/arm64/kernel/vdso \ No newline at end of file diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h index 6f7f22ac9da5..4366da278033 100644 --- a/tools/include/linux/compiler.h +++ b/tools/include/linux/compiler.h @@ -2,6 +2,8 @@ #ifndef _TOOLS_LINUX_COMPILER_H_ #define _TOOLS_LINUX_COMPILER_H_ +#ifndef __ASSEMBLY__ + #include <linux/compiler_types.h> #ifndef __compiletime_error @@ -224,4 +226,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s __asm__ ("" : "=r" (var) : "0" (var)) #endif +#endif /* __ASSEMBLY__ */ + #endif /* _TOOLS_LINUX_COMPILER_H */ diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile index e21e78aae24d..29b4ac928e0b 100644 --- a/tools/testing/selftests/vDSO/Makefile +++ b/tools/testing/selftests/vDSO/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 uname_M := $(shell uname -m 2>/dev/null || echo not) -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e s/aarch64.*/arm64/) TEST_GEN_PROGS := vdso_test_gettimeofday TEST_GEN_PROGS += vdso_test_getcpu @@ -10,7 +10,7 @@ ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64)) TEST_GEN_PROGS += vdso_standalone_test_x86 endif TEST_GEN_PROGS += vdso_test_correctness -ifeq ($(uname_M),x86_64) +ifeq ($(uname_M), $(filter x86_64 aarch64, $(uname_M))) TEST_GEN_PROGS += vdso_test_getrandom TEST_GEN_PROGS += vdso_test_chacha endif @@ -41,5 +41,6 @@ $(OUTPUT)/vdso_test_getrandom: CFLAGS += -isystem $(top_srcdir)/tools/include \ $(OUTPUT)/vdso_test_chacha: $(top_srcdir)/tools/arch/$(ARCH)/vdso/vgetrandom-chacha.S $(OUTPUT)/vdso_test_chacha: CFLAGS += -idirafter $(top_srcdir)/tools/include \ -idirafter $(top_srcdir)/arch/$(ARCH)/include \ + -idirafter $(top_srcdir)/arch/$(ARCH)/include/generated \ -idirafter $(top_srcdir)/include \ -D__ASSEMBLY__ -Wa,--noexecstack
Hook up the generic vDSO implementation to the aarch64 vDSO data page. The _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a offset larger than the vdso_data. The vDSO function requires a ChaCha20 implementation that does not write to the stack, and that can do an entire ChaCha20 permutation. The one provided is based on the current chacha-neon-core.S and uses NEON on the permute operation. The fallback for chips that do not support NEON issues the syscall. This also passes the vdso_test_chacha test along with vdso_test_getrandom. The vdso_test_getrandom bench-single result on Neoverse-N1 shows: vdso: 25000000 times in 0.746506464 seconds libc: 25000000 times in 8.849179444 seconds syscall: 25000000 times in 8.818726425 seconds Changes from v1: - Fixed style issues and typos. - Added fallback for systems without NEON support. - Avoid use of non-volatile vector registers in neon chacha20. - Use c-getrandom-y for vgetrandom.c. - Fixed TIMENS vdso_rnd_data access. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/vdso.h | 6 + arch/arm64/include/asm/vdso/getrandom.h | 49 ++++++ arch/arm64/include/asm/vdso/vsyscall.h | 10 ++ arch/arm64/kernel/vdso.c | 6 - arch/arm64/kernel/vdso/Makefile | 11 +- arch/arm64/kernel/vdso/vdso | 1 + arch/arm64/kernel/vdso/vdso.lds.S | 4 + arch/arm64/kernel/vdso/vgetrandom-chacha.S | 168 +++++++++++++++++++++ arch/arm64/kernel/vdso/vgetrandom.c | 15 ++ lib/vdso/getrandom.c | 1 + tools/arch/arm64/vdso | 1 + tools/include/linux/compiler.h | 4 + tools/testing/selftests/vDSO/Makefile | 5 +- 14 files changed, 273 insertions(+), 9 deletions(-) create mode 100644 arch/arm64/include/asm/vdso/getrandom.h create mode 120000 arch/arm64/kernel/vdso/vdso create mode 100644 arch/arm64/kernel/vdso/vgetrandom-chacha.S create mode 100644 arch/arm64/kernel/vdso/vgetrandom.c create mode 120000 tools/arch/arm64/vdso