diff mbox series

[v5] LoongArch: vDSO: Wire up getrandom() vDSO implementation

Message ID 20240829125656.19017-1-xry111@xry111.site (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series [v5] LoongArch: vDSO: Wire up getrandom() vDSO implementation | expand

Commit Message

Xi Ruoyao Aug. 29, 2024, 12:56 p.m. UTC
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations.

Also enable the vDSO getrandom tests for LoongArch: create the symlink
to the arch/loongarch/vdso directory, and correctly set the ARCH
variable for LoongArch.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---

Cc: linux-crypto@vger.kernel.org
Cc: loongarch@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
Cc: Jinyang He <hejinyang@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>

Passed vdso_test_getrandom and vdso_test_chacha tests.  Benchmark
results:

vdso_test_getrandom bench-single:

       vdso: 25000000 times in 0.499490289 seconds
       libc: 25000000 times in 6.963829873 seconds
    syscall: 25000000 times in 6.983413486 seconds

vdso_test_getrandom bench-multi:
       vdso: 25000000 x 256 times in 28.703710823 seconds
       libc: 25000000 x 256 times in 356.835801784 seconds
       syscall: 25000000 x 256 times in 338.525837197 seconds

[v4]->v5:
- Rebase onto crng/random.git:
  - Remove two selftest patches.
  - Remove __arch_chacha20_blocks_nostack forward declaration.
- Squash the remaining selftest patch into the vDSO getrandom
  implementation patch.
- Remove ifdef CONFIG_VDSO_GETRANDOM and $(CONFIG_VDSO_GETRANDOM) as
  they are always true in arch/loongarch.
- Remove asm-offsets.c change which has been already unneeded in v4.
- Add comment about rematerializing the constant in the assembly code.
- Add prototype for __vdso_getrandom to silence a -Wmissing-prototype
  warning.

[v3]->v4:
- Remove LSX implementation, which isn't much faster than the generic
  implementaion.
- Rebase onto crng/random.git:
  - Define __arch_get_k_vdso_rng_data instead of using inline asm to
    provide the _vdso_rng_data symbol in a magic way.
  - Remove memset.S.
  - Use c-getrandom-y to easily include the generic C code.
  - The benchmark results seem better than v3, maybe related to the TLS
    refactoring in random.git.
- Add patches for selftests.

[v2]->v3:
- Add a generic LoongArch implementation for which LSX isn't needed.

v1->v2:
- Properly send the series to the list.

[v4]:https://lore.kernel.org/all/20240827132018.88854-1-xry111@xry111.site/
[v3]:https://lore.kernel.org/all/20240816110717.10249-1-xry111@xry111.site/
[v2]:https://lore.kernel.org/all/20240815133357.35829-1-xry111@xry111.site/

 arch/loongarch/Kconfig                      |   1 +
 arch/loongarch/include/asm/vdso/getrandom.h |  44 ++++
 arch/loongarch/include/asm/vdso/vdso.h      |   6 +
 arch/loongarch/include/asm/vdso/vsyscall.h  |   8 +
 arch/loongarch/kernel/asm-offsets.c         |   1 +
 arch/loongarch/kernel/vdso.c                |   2 +
 arch/loongarch/vdso/Makefile                |   7 +-
 arch/loongarch/vdso/vdso.lds.S              |   1 +
 arch/loongarch/vdso/vgetrandom-chacha.S     | 242 ++++++++++++++++++++
 arch/loongarch/vdso/vgetrandom.c            |  15 ++
 tools/arch/loongarch/vdso                   |   1 +
 tools/testing/selftests/vDSO/Makefile       |   4 +-
 12 files changed, 329 insertions(+), 3 deletions(-)
 create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h
 create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S
 create mode 100644 arch/loongarch/vdso/vgetrandom.c
 create mode 120000 tools/arch/loongarch/vdso


base-commit: ec309dd126280ca4ef4087e02246a6f61f36c8d1

Comments

Xi Ruoyao Aug. 29, 2024, 1:01 p.m. UTC | #1
On Thu, 2024-08-29 at 20:56 +0800, Xi Ruoyao wrote:
> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c

Oops stupid I.  Please remove this hunk applying the patch.

I'll remove this in v6 if needed.

> index bee9f7a3108f..ab258878d551 100644
> --- a/arch/loongarch/kernel/asm-offsets.c
> +++ b/arch/loongarch/kernel/asm-offsets.c
> @@ -14,6 +14,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/processor.h>
>  #include <asm/ftrace.h>
> +#include <asm/vdso/vdso.h>
>  
>  static void __used output_ptreg_defines(void)
>  {
Jason A. Donenfeld Aug. 29, 2024, 1:05 p.m. UTC | #2
On Thu, Aug 29, 2024 at 09:01:05PM +0800, Xi Ruoyao wrote:
> On Thu, 2024-08-29 at 20:56 +0800, Xi Ruoyao wrote:
> > diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
> 
> Oops stupid I.  Please remove this hunk applying the patch.
> 
> I'll remove this in v6 if needed.

While we're on the topic of herps and derps,

> Signed-off-by: Xi Ruoyao <xry111@xry111.site>
> ---
> 
> Cc: linux-crypto@vger.kernel.org
> Cc: loongarch@lists.linux.dev
> Cc: linux-kernel@vger.kernel.org
> Cc: Jinyang He <hejinyang@loongson.cn>
> Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>

Typically all of the Cc tags go before your Signed-off-by and enter the
git history of the commit. "Here's who received this letter" sort of
documentation. (But not a big deal and something I can fix up upon
commit, of course.)
Jason A. Donenfeld Aug. 29, 2024, 1:13 p.m. UTC | #3
Hi Huacai & Ruoyao,

On Thu, Aug 29, 2024 at 08:56:55PM +0800, Xi Ruoyao wrote:
> Hook up the generic vDSO implementation to the LoongArch vDSO data page
> by providing the required __arch_chacha20_blocks_nostack,
> __arch_get_k_vdso_rng_data, and getrandom_syscall implementations.
> 
> Also enable the vDSO getrandom tests for LoongArch: create the symlink
> to the arch/loongarch/vdso directory, and correctly set the ARCH
> variable for LoongArch.
> 
> Signed-off-by: Xi Ruoyao <xry111@xry111.site>
> ---
>
> Passed vdso_test_getrandom and vdso_test_chacha tests.  Benchmark
> results:
> 
> vdso_test_getrandom bench-single:
> 
>        vdso: 25000000 times in 0.499490289 seconds
>        libc: 25000000 times in 6.963829873 seconds
>     syscall: 25000000 times in 6.983413486 seconds
> 
> vdso_test_getrandom bench-multi:
>        vdso: 25000000 x 256 times in 28.703710823 seconds
>        libc: 25000000 x 256 times in 356.835801784 seconds
>        syscall: 25000000 x 256 times in 338.525837197 seconds
 
Thanks for including the test results. I've also verified the
vdso_test_chacha and can confirm that passes:

zx2c4@thinkpad ~/Projects/random-linux/tools/testing/selftests/vDSO $  /home/zx2c4/Downloads/Chrome/cross-tools/bin/loongarch64-unknown-linux-gnu-gcc -std=gnu99 -D_GNU_SOURCE= -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../tools/include -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../arch/loongarch/include -idirafter /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../include -D__ASSEMBLY__ -Wa,--noexecstack    vdso_test_chacha.c /home/zx2c4/Projects/random-linux/tools/testing/selftests/../../../tools/arch/loongarch/vdso/vgetrandom-chacha.S  -o /home/zx2c4/Projects/random-linux/tools/testing/selftests/vDSO/vdso_test_chacha -static
zx2c4@thinkpad ~/Projects/random-linux/tools/testing/selftests/vDSO $ qemu-loongarch64 /home/zx2c4/Projects/random-linux/tools/testing/selftests/vDSO/vdso_test_chacha
TAP version 13
1..1
ok 1 chacha: PASS

So, with the asm-offsets.c snippet removed as you instructed, this is
fine for me. I'll happily queue this up for 6.12, but I'd like Huacai's
Acked-by first.

Huacai, does this look okay to you?

Thanks,
Jason
Huacai Chen Aug. 29, 2024, 1:18 p.m. UTC | #4
Hi, Ruoyao,

On Thu, Aug 29, 2024 at 8:58 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> Hook up the generic vDSO implementation to the LoongArch vDSO data page
> by providing the required __arch_chacha20_blocks_nostack,
> __arch_get_k_vdso_rng_data, and getrandom_syscall implementations.
>
> Also enable the vDSO getrandom tests for LoongArch: create the symlink
> to the arch/loongarch/vdso directory, and correctly set the ARCH
> variable for LoongArch.
>
> Signed-off-by: Xi Ruoyao <xry111@xry111.site>
> ---
>
> Cc: linux-crypto@vger.kernel.org
> Cc: loongarch@lists.linux.dev
> Cc: linux-kernel@vger.kernel.org
> Cc: Jinyang He <hejinyang@loongson.cn>
> Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>
> Passed vdso_test_getrandom and vdso_test_chacha tests.  Benchmark
> results:
>
> vdso_test_getrandom bench-single:
>
>        vdso: 25000000 times in 0.499490289 seconds
>        libc: 25000000 times in 6.963829873 seconds
>     syscall: 25000000 times in 6.983413486 seconds
>
> vdso_test_getrandom bench-multi:
>        vdso: 25000000 x 256 times in 28.703710823 seconds
>        libc: 25000000 x 256 times in 356.835801784 seconds
>        syscall: 25000000 x 256 times in 338.525837197 seconds
>
> [v4]->v5:
> - Rebase onto crng/random.git:
>   - Remove two selftest patches.
>   - Remove __arch_chacha20_blocks_nostack forward declaration.
> - Squash the remaining selftest patch into the vDSO getrandom
>   implementation patch.
> - Remove ifdef CONFIG_VDSO_GETRANDOM and $(CONFIG_VDSO_GETRANDOM) as
>   they are always true in arch/loongarch.
> - Remove asm-offsets.c change which has been already unneeded in v4.
> - Add comment about rematerializing the constant in the assembly code.
> - Add prototype for __vdso_getrandom to silence a -Wmissing-prototype
>   warning.
>
> [v3]->v4:
> - Remove LSX implementation, which isn't much faster than the generic
>   implementaion.
> - Rebase onto crng/random.git:
>   - Define __arch_get_k_vdso_rng_data instead of using inline asm to
>     provide the _vdso_rng_data symbol in a magic way.
>   - Remove memset.S.
>   - Use c-getrandom-y to easily include the generic C code.
>   - The benchmark results seem better than v3, maybe related to the TLS
>     refactoring in random.git.
> - Add patches for selftests.
>
> [v2]->v3:
> - Add a generic LoongArch implementation for which LSX isn't needed.
>
> v1->v2:
> - Properly send the series to the list.
>
> [v4]:https://lore.kernel.org/all/20240827132018.88854-1-xry111@xry111.site/
> [v3]:https://lore.kernel.org/all/20240816110717.10249-1-xry111@xry111.site/
> [v2]:https://lore.kernel.org/all/20240815133357.35829-1-xry111@xry111.site/
>
>  arch/loongarch/Kconfig                      |   1 +
>  arch/loongarch/include/asm/vdso/getrandom.h |  44 ++++
>  arch/loongarch/include/asm/vdso/vdso.h      |   6 +
>  arch/loongarch/include/asm/vdso/vsyscall.h  |   8 +
>  arch/loongarch/kernel/asm-offsets.c         |   1 +
>  arch/loongarch/kernel/vdso.c                |   2 +
>  arch/loongarch/vdso/Makefile                |   7 +-
>  arch/loongarch/vdso/vdso.lds.S              |   1 +
>  arch/loongarch/vdso/vgetrandom-chacha.S     | 242 ++++++++++++++++++++
>  arch/loongarch/vdso/vgetrandom.c            |  15 ++
>  tools/arch/loongarch/vdso                   |   1 +
>  tools/testing/selftests/vDSO/Makefile       |   4 +-
>  12 files changed, 329 insertions(+), 3 deletions(-)
>  create mode 100644 arch/loongarch/include/asm/vdso/getrandom.h
>  create mode 100644 arch/loongarch/vdso/vgetrandom-chacha.S
>  create mode 100644 arch/loongarch/vdso/vgetrandom.c
>  create mode 120000 tools/arch/loongarch/vdso
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 70f169210b52..14821c2aba5b 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -190,6 +190,7 @@ config LOONGARCH
>         select TRACE_IRQFLAGS_SUPPORT
>         select USE_PERCPU_NUMA_NODE_ID
>         select USER_STACKTRACE_SUPPORT
> +       select VDSO_GETRANDOM
>         select ZONE_DMA32
>
>  config 32BIT
> diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h
> new file mode 100644
> index 000000000000..04c991f6921d
> --- /dev/null
> +++ b/arch/loongarch/include/asm/vdso/getrandom.h
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
> + */
> +#ifndef __ASM_VDSO_GETRANDOM_H
> +#define __ASM_VDSO_GETRANDOM_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#include <asm/unistd.h>
> +#include <asm/vdso/vdso.h>
> +
> +static __always_inline ssize_t getrandom_syscall(void *_buffer,
> +                                                size_t _len,
> +                                                unsigned int _flags)
> +{
> +       register long ret asm("a0");
> +       register long nr asm("a7") = __NR_getrandom;
> +       register void *buffer asm("a0") = _buffer;
> +       register size_t len asm("a1") = _len;
> +       register unsigned int flags asm("a2") = _flags;
> +
> +       asm volatile(
> +       "      syscall 0\n"
> +       : "+r" (ret)
> +       : "r" (nr), "r" (buffer), "r" (len), "r" (flags)
> +       : "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8",
> +         "memory");
> +
> +       return ret;
> +}
> +
> +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
> +       void)
As I said before, no line break needed, because there is no 80
characters limit now.

> +{
> +       return (const struct vdso_rng_data *)(
> +               get_vdso_data() +
> +               VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
> +               offsetof(struct loongarch_vdso_data, rng_data));
> +}
The same, we don't need so many lines.

> +
> +#endif /* !__ASSEMBLY__ */
> +
> +#endif /* __ASM_VDSO_GETRANDOM_H */
> diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h
> index 5a12309d9fb5..e31ac7474513 100644
> --- a/arch/loongarch/include/asm/vdso/vdso.h
> +++ b/arch/loongarch/include/asm/vdso/vdso.h
> @@ -4,6 +4,9 @@
>   * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
>   */
>
> +#ifndef _ASM_VDSO_VDSO_H
> +#define _ASM_VDSO_VDSO_H
> +
>  #ifndef __ASSEMBLY__
>
>  #include <asm/asm.h>
> @@ -16,6 +19,7 @@ struct vdso_pcpu_data {
>
>  struct loongarch_vdso_data {
>         struct vdso_pcpu_data pdata[NR_CPUS];
> +       struct vdso_rng_data rng_data;
>  };
>
>  /*
> @@ -63,3 +67,5 @@ static inline unsigned long get_vdso_data(void)
>  }
>
>  #endif /* __ASSEMBLY__ */
> +
> +#endif
> diff --git a/arch/loongarch/include/asm/vdso/vsyscall.h b/arch/loongarch/include/asm/vdso/vsyscall.h
> index 5de615383a22..b1273ce6f140 100644
> --- a/arch/loongarch/include/asm/vdso/vsyscall.h
> +++ b/arch/loongarch/include/asm/vdso/vsyscall.h
> @@ -8,6 +8,7 @@
>  #include <vdso/datapage.h>
>
>  extern struct vdso_data *vdso_data;
> +extern struct vdso_rng_data *vdso_rng_data;
>
>  /*
>   * Update the vDSO data page to keep in sync with kernel timekeeping.
> @@ -19,6 +20,13 @@ struct vdso_data *__loongarch_get_k_vdso_data(void)
>  }
>  #define __arch_get_k_vdso_data __loongarch_get_k_vdso_data
>
> +static __always_inline
> +struct vdso_rng_data *__loongarch_get_k_vdso_rng_data(void)
> +{
> +       return vdso_rng_data;
> +}
> +#define __arch_get_k_vdso_rng_data __loongarch_get_k_vdso_rng_data
> +
>  /* The asm-generic header needs to be included after the definitions above */
>  #include <asm-generic/vdso/vsyscall.h>
>
> diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
> index bee9f7a3108f..ab258878d551 100644
> --- a/arch/loongarch/kernel/asm-offsets.c
> +++ b/arch/loongarch/kernel/asm-offsets.c
> @@ -14,6 +14,7 @@
>  #include <asm/ptrace.h>
>  #include <asm/processor.h>
>  #include <asm/ftrace.h>
> +#include <asm/vdso/vdso.h>
>
>  static void __used output_ptreg_defines(void)
>  {
> diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
> index 90dfccb41c14..2af05ba5f121 100644
> --- a/arch/loongarch/kernel/vdso.c
> +++ b/arch/loongarch/kernel/vdso.c
> @@ -22,6 +22,7 @@
>  #include <vdso/helpers.h>
>  #include <vdso/vsyscall.h>
>  #include <vdso/datapage.h>
> +#include <generated/asm-offsets.h>
>  #include <generated/vdso-offsets.h>
>
>  extern char vdso_start[], vdso_end[];
> @@ -37,6 +38,7 @@ static union {
>  static struct page *vdso_pages[] = { NULL };
>  struct vdso_data *vdso_data = generic_vdso_data.data;
>  struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata;
> +struct vdso_rng_data *vdso_rng_data = &loongarch_vdso_data.vdata.rng_data;
>
>  static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma)
>  {
> diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile
> index d724d46b07c8..19249d4b3542 100644
> --- a/arch/loongarch/vdso/Makefile
> +++ b/arch/loongarch/vdso/Makefile
> @@ -4,7 +4,8 @@
>  # Include the generic Makefile to check the built vdso.
>  include $(srctree)/lib/vdso/Makefile
>
> -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
> +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \
> +              vgetrandom-chacha.o
>
>  # Common compiler flags between ABIs.
>  ccflags-vdso := \
> @@ -29,6 +30,10 @@ ifneq ($(c-gettimeofday-y),)
>    CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
>  endif
>
> +ifneq ($(c-getrandom-y),)
> +  CFLAGS_vgetrandom.o += -include $(c-getrandom-y)
> +endif
> +
>  # VDSO linker flags.
>  ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
>         $(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared \
> diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S
> index 56ad855896de..2c965a597d9e 100644
> --- a/arch/loongarch/vdso/vdso.lds.S
> +++ b/arch/loongarch/vdso/vdso.lds.S
> @@ -63,6 +63,7 @@ VERSION
>                 __vdso_clock_gettime;
>                 __vdso_gettimeofday;
>                 __vdso_rt_sigreturn;
> +               __vdso_getrandom;
I also said before that keep  __vdso_rt_sigreturn be the last one.

>         local: *;
>         };
>  }
> diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S
> new file mode 100644
> index 000000000000..7e86a50f6e85
> --- /dev/null
> +++ b/arch/loongarch/vdso/vgetrandom-chacha.S
> @@ -0,0 +1,242 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/regdef.h>
> +#include <linux/linkage.h>
> +
> +.text
> +
> +/* Salsa20 quarter-round */
> +.macro QR      a b c d
> +       add.w           \a, \a, \b
> +       xor             \d, \d, \a
> +       rotri.w         \d, \d, 16
> +
> +       add.w           \c, \c, \d
> +       xor             \b, \b, \c
> +       rotri.w         \b, \b, 20
> +
> +       add.w           \a, \a, \b
> +       xor             \d, \d, \a
> +       rotri.w         \d, \d, 24
> +
> +       add.w           \c, \c, \d
> +       xor             \b, \b, \c
> +       rotri.w         \b, \b, 25
> +.endm
> +
> +/*
> + * Very basic LoongArch implementation of ChaCha20. Produces a given positive
> + * number of blocks of output with a nonce of 0, taking an input key and
> + * 8-byte counter. Importantly does not spill to the stack. Its arguments
> + * are:
> + *
> + *     a0: output bytes
> + *     a1: 32-byte key input
> + *     a2: 8-byte counter input/output
> + *     a3: number of 64-byte blocks to write to output
> + */
> +SYM_FUNC_START(__arch_chacha20_blocks_nostack)
> +
> +/* We don't need a frame pointer */
> +#define s9             fp
> +
> +#define output         a0
> +#define key            a1
> +#define counter                a2
> +#define nblocks                a3
> +#define i              a4
> +#define state0         s0
> +#define state1         s1
> +#define state2         s2
> +#define state3         s3
> +#define state4         s4
> +#define state5         s5
> +#define state6         s6
> +#define state7         s7
> +#define state8         s8
> +#define state9         s9
> +#define state10                a5
> +#define state11                a6
> +#define state12                a7
> +#define state13                t0
> +#define state14                t1
> +#define state15                t2
> +#define cnt_lo         t3
> +#define cnt_hi         t4
> +#define copy0          t5
> +#define copy1          t6
> +#define copy2          t7
> +
> +/* Reuse i as copy3 */
> +#define copy3          i
> +
> +       /*
> +        * The ABI requires s0-s9 saved, and sp aligned to 16-byte.
> +        * This does not violate the stack-less requirement: no sensitive data
> +        * is spilled onto the stack.
> +        */
> +       PTR_ADDI        sp, sp, (-SZREG * 10) & STACK_ALIGN
> +       REG_S           s0, sp, 0
> +       REG_S           s1, sp, SZREG
> +       REG_S           s2, sp, SZREG * 2
> +       REG_S           s3, sp, SZREG * 3
> +       REG_S           s4, sp, SZREG * 4
> +       REG_S           s5, sp, SZREG * 5
> +       REG_S           s6, sp, SZREG * 6
> +       REG_S           s7, sp, SZREG * 7
> +       REG_S           s8, sp, SZREG * 8
> +       REG_S           s9, sp, SZREG * 9
> +
> +       li.w            copy0, 0x61707865
> +       li.w            copy1, 0x3320646e
> +       li.w            copy2, 0x79622d32
> +
> +       ld.w            cnt_lo, counter, 0
> +       ld.w            cnt_hi, counter, 4
> +
> +.Lblock:
> +       /* state[0,1,2,3] = "expand 32-byte k" */
> +       move            state0, copy0
> +       move            state1, copy1
> +       move            state2, copy2
> +       li.w            state3, 0x6b206574
> +
> +       /* state[4,5,..,11] = key */
> +       ld.w            state4, key, 0
> +       ld.w            state5, key, 4
> +       ld.w            state6, key, 8
> +       ld.w            state7, key, 12
> +       ld.w            state8, key, 16
> +       ld.w            state9, key, 20
> +       ld.w            state10, key, 24
> +       ld.w            state11, key, 28
> +
> +       /* state[12,13] = counter */
> +       move            state12, cnt_lo
> +       move            state13, cnt_hi
> +
> +       /* state[14,15] = 0 */
> +       move            state14, zero
> +       move            state15, zero
> +
> +       li.w            i, 10
> +.Lpermute:
> +       /* odd round */
> +       QR              state0, state4, state8, state12
> +       QR              state1, state5, state9, state13
> +       QR              state2, state6, state10, state14
> +       QR              state3, state7, state11, state15
> +
> +       /* even round */
> +       QR              state0, state5, state10, state15
> +       QR              state1, state6, state11, state12
> +       QR              state2, state7, state8, state13
> +       QR              state3, state4, state9, state14
> +
> +       addi.w          i, i, -1
> +       bnez            i, .Lpermute
> +
> +       /*
> +        * copy[3] = "expa", materialize it here because copy[3] shares the
> +        * same register with i which just became dead.
> +        */
> +       li.w            copy3, 0x6b206574
> +
> +       /* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */
> +       add.w           state0, state0, copy0
> +       add.w           state1, state1, copy1
> +       add.w           state2, state2, copy2
> +       add.w           state3, state3, copy3
> +       st.w            state0, output, 0
> +       st.w            state1, output, 4
> +       st.w            state2, output, 8
> +       st.w            state3, output, 12
> +
> +       /* from now on state[0,1,2,3] are scratch registers  */
> +
> +       /* state[0,1,2,3] = lo32(key) */
> +       ld.w            state0, key, 0
> +       ld.w            state1, key, 4
> +       ld.w            state2, key, 8
> +       ld.w            state3, key, 12
> +
> +       /* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */
> +       add.w           state4, state4, state0
> +       add.w           state5, state5, state1
> +       add.w           state6, state6, state2
> +       add.w           state7, state7, state3
> +       st.w            state4, output, 16
> +       st.w            state5, output, 20
> +       st.w            state6, output, 24
> +       st.w            state7, output, 28
> +
> +       /* state[0,1,2,3] = hi32(key) */
> +       ld.w            state0, key, 16
> +       ld.w            state1, key, 20
> +       ld.w            state2, key, 24
> +       ld.w            state3, key, 28
> +
> +       /* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */
> +       add.w           state8, state8, state0
> +       add.w           state9, state9, state1
> +       add.w           state10, state10, state2
> +       add.w           state11, state11, state3
> +       st.w            state8, output, 32
> +       st.w            state9, output, 36
> +       st.w            state10, output, 40
> +       st.w            state11, output, 44
> +
> +       /* output[12,13,14,15] = state[12,13,14,15] + [cnt_lo, cnt_hi, 0, 0] */
> +       add.w           state12, state12, cnt_lo
> +       add.w           state13, state13, cnt_hi
> +       st.w            state12, output, 48
> +       st.w            state13, output, 52
> +       st.w            state14, output, 56
> +       st.w            state15, output, 60
> +
> +       /* ++counter  */
> +       addi.w          cnt_lo, cnt_lo, 1
> +       sltui           state0, cnt_lo, 1
> +       add.w           cnt_hi, cnt_hi, state0
> +
> +       /* output += 64 */
> +       PTR_ADDI        output, output, 64
> +       /* --nblocks */
> +       PTR_ADDI        nblocks, nblocks, -1
> +       bnez            nblocks, .Lblock
> +
> +       /* counter = [cnt_lo, cnt_hi] */
> +       st.w            cnt_lo, counter, 0
> +       st.w            cnt_hi, counter, 4
> +
> +       /*
> +        * Zero out the potentially sensitive regs, in case nothing uses these
> +        * again. As at now copy[0,1,2,3] just contains "expand 32-byte k" and
> +        * state[0,...,9] are s0-s9 those we'll restore in the epilogue, so we
> +        * only need to zero state[11,...,15].
> +        */
> +       move            state10, zero
> +       move            state11, zero
> +       move            state12, zero
> +       move            state13, zero
> +       move            state14, zero
> +       move            state15, zero
> +
> +       REG_L           s0, sp, 0
> +       REG_L           s1, sp, SZREG
> +       REG_L           s2, sp, SZREG * 2
> +       REG_L           s3, sp, SZREG * 3
> +       REG_L           s4, sp, SZREG * 4
> +       REG_L           s5, sp, SZREG * 5
> +       REG_L           s6, sp, SZREG * 6
> +       REG_L           s7, sp, SZREG * 7
> +       REG_L           s8, sp, SZREG * 8
> +       REG_L           s9, sp, SZREG * 9
> +       PTR_ADDI        sp, sp, -((-SZREG * 10) & STACK_ALIGN)
> +
> +       jr              ra
> +SYM_FUNC_END(__arch_chacha20_blocks_nostack)
> diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c
> new file mode 100644
> index 000000000000..68e44b3f1d49
> --- /dev/null
> +++ b/arch/loongarch/vdso/vgetrandom.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
> + */
> +#include <linux/types.h>
> +
> +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags,
> +                        void *opaque_state, size_t opaque_len);
Please see commit 42874e4eb35bdfc54f8514685e50434098ba4f6, it is
better to avoid such hacks.

> +
> +ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags,
> +                        void *opaque_state, size_t opaque_len)
> +{
> +       return __cvdso_getrandom(buffer, len, flags, opaque_state,
> +                                opaque_len);
Again, no line breaks needed.


Huacai

> +}
> diff --git a/tools/arch/loongarch/vdso b/tools/arch/loongarch/vdso
> new file mode 120000
> index 000000000000..ebda43a82db7
> --- /dev/null
> +++ b/tools/arch/loongarch/vdso
> @@ -0,0 +1 @@
> +../../../arch/loongarch/vdso
> \ No newline at end of file
> diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
> index e21e78aae24d..606ce5f5c2a4 100644
> --- a/tools/testing/selftests/vDSO/Makefile
> +++ b/tools/testing/selftests/vDSO/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  uname_M := $(shell uname -m 2>/dev/null || echo not)
> -ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
> +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e /loongarch/s/[0-9]//g)
>
>  TEST_GEN_PROGS := vdso_test_gettimeofday
>  TEST_GEN_PROGS += vdso_test_getcpu
> @@ -10,7 +10,7 @@ ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64))
>  TEST_GEN_PROGS += vdso_standalone_test_x86
>  endif
>  TEST_GEN_PROGS += vdso_test_correctness
> -ifeq ($(uname_M),x86_64)
> +ifeq ($(uname_M),$(filter $(uname_M),x86_64 loongarch64))
>  TEST_GEN_PROGS += vdso_test_getrandom
>  TEST_GEN_PROGS += vdso_test_chacha
>  endif
>
> base-commit: ec309dd126280ca4ef4087e02246a6f61f36c8d1
> --
> 2.46.0
>
Jason A. Donenfeld Aug. 29, 2024, 1:27 p.m. UTC | #5
One small question just occurred to me:

> +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
> +	void)
> +{
> +	return (const struct vdso_rng_data *)(
> +		get_vdso_data() +
> +		VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
> +		offsetof(struct loongarch_vdso_data, rng_data));
> +}

Did you test this in a TIMENS? On x86, I had to deal with the page
offsets switching around depending on whether there was a TIMENS. I
tested this in my test harness with some basic code like:

       if (argc == 1) {
               if (unshare(CLONE_NEWTIME))
                       panic("unshare(CLONE_NEWTIME)");
               if (!fork()) {
                       if (execl(argv[0], argv[0], "now-in-timens"))
                               panic("execl");
               }
               wait(NULL);
               poweroff();
       }

Because unlike other namespaces, the time one only becomes active after
fork/exec.

But maybe loongarch is more organized and you don't need any special
handling in __arch_get_vdso...data() functions like I needed on x86.
Just thought I should check.

Jason
Jason A. Donenfeld Aug. 29, 2024, 1:37 p.m. UTC | #6
On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote:
> One small question just occurred to me:
> 
> > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
> > +	void)
> > +{
> > +	return (const struct vdso_rng_data *)(
> > +		get_vdso_data() +
> > +		VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
> > +		offsetof(struct loongarch_vdso_data, rng_data));
> > +}
> 
> Did you test this in a TIMENS? On x86, I had to deal with the page
> offsets switching around depending on whether there was a TIMENS. I
> tested this in my test harness with some basic code like:
> 
>        if (argc == 1) {
>                if (unshare(CLONE_NEWTIME))
>                        panic("unshare(CLONE_NEWTIME)");
>                if (!fork()) {
>                        if (execl(argv[0], argv[0], "now-in-timens"))
>                                panic("execl");
>                }
>                wait(NULL);
>                poweroff();
>        }
> 
> Because unlike other namespaces, the time one only becomes active after
> fork/exec.
> 
> But maybe loongarch is more organized and you don't need any special
> handling in __arch_get_vdso...data() functions like I needed on x86.
> Just thought I should check.

Normal results:

   vdso: 25000000 times in 0.287330836 seconds
   libc: 25000000 times in 4.480710835 seconds
syscall: 25000000 times in 4.411098048 seconds

After applying

diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h
index ff5334ad32a0..5cb1b318ebe3 100644
--- a/arch/x86/include/asm/vdso/getrandom.h
+++ b/arch/x86/include/asm/vdso/getrandom.h
@@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig

 static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void)
 {
-	if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS)
-		return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data);
 	return &__vdso_rng_data;
 }

the results are:

   vdso: 25000000 times in 4.403789593 seconds
   libc: 25000000 times in 4.466771093 seconds
syscall: 25000000 times in 4.428145416 seconds

The difference is that when it finds the shared data in the wrong place,
it thinks the RNG is uninitialized, so it always falls back to the
syscall, hence all three times being the same.

If you're unsure how timens handling works on loongarch, try this test
yourself and see what you get.

Jason
Xi Ruoyao Aug. 29, 2024, 2:06 p.m. UTC | #7
On Thu, 2024-08-29 at 15:37 +0200, Jason A. Donenfeld wrote:
> On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote:
> > One small question just occurred to me:
> > 
> > > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
> > > +	void)
> > > +{
> > > +	return (const struct vdso_rng_data *)(
> > > +		get_vdso_data() +
> > > +		VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
> > > +		offsetof(struct loongarch_vdso_data, rng_data));
> > > +}
> > 
> > Did you test this in a TIMENS? On x86, I had to deal with the page
> > offsets switching around depending on whether there was a TIMENS. I
> > tested this in my test harness with some basic code like:
> > 
> >        if (argc == 1) {
> >                if (unshare(CLONE_NEWTIME))
> >                        panic("unshare(CLONE_NEWTIME)");
> >                if (!fork()) {
> >                        if (execl(argv[0], argv[0], "now-in-timens"))
> >                                panic("execl");
> >                }
> >                wait(NULL);
> >                poweroff();
> >        }
> > 
> > Because unlike other namespaces, the time one only becomes active after
> > fork/exec.
> > 
> > But maybe loongarch is more organized and you don't need any special
> > handling in __arch_get_vdso...data() functions like I needed on x86.
> > Just thought I should check.
> 
> Normal results:
> 
>    vdso: 25000000 times in 0.287330836 seconds
>    libc: 25000000 times in 4.480710835 seconds
> syscall: 25000000 times in 4.411098048 seconds
> 
> After applying
> 
> diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h
> index ff5334ad32a0..5cb1b318ebe3 100644
> --- a/arch/x86/include/asm/vdso/getrandom.h
> +++ b/arch/x86/include/asm/vdso/getrandom.h
> @@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig
> 
>  static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void)
>  {
> -	if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS)
> -		return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data);
>  	return &__vdso_rng_data;
>  }
> 
> the results are:
> 
>    vdso: 25000000 times in 4.403789593 seconds
>    libc: 25000000 times in 4.466771093 seconds
> syscall: 25000000 times in 4.428145416 seconds
> 
> The difference is that when it finds the shared data in the wrong place,
> it thinks the RNG is uninitialized, so it always falls back to the
> syscall, hence all three times being the same.
> 
> If you're unsure how timens handling works on loongarch, try this test
> yourself and see what you get.

$ unshare -r -T --boottime $((365*24*3600))
# uptime
 21:54:36 up 365 days,  5:38,  0 user,  load average: 0.05, 0.08, 2.82
# /home/xry111/git-repos/linux/tools/testing/selftests/vDSO/vdso_test_getrandom bench-single
   vdso: 25000000 times in 0.499528591 seconds
   libc: 25000000 times in 6.968980040 seconds
syscall: 25000000 times in 6.987357071 seconds

So it seems normal in a time ns.

And from a comment in arch/loongarch/include/asm/vdso/vdso.h:

/*
 * The layout of vvar:
 *
 *                      high
 * +---------------------+--------------------------+
 * | loongarch vdso data | LOONGARCH_VDSO_DATA_SIZE |
 * +---------------------+--------------------------+
 * |  time-ns vdso data  |        PAGE_SIZE         |
 * +---------------------+--------------------------+
 * |  generic vdso data  |        PAGE_SIZE         |
 * +---------------------+--------------------------+
 *                      low
 */

And VVAR_LOONGARCH_PAGES_START is 2:

enum vvar_pages {
    VVAR_GENERIC_PAGE_OFFSET,
    VVAR_TIMENS_PAGE_OFFSET,
    VVAR_LOONGARCH_PAGES_START,
    VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START +
LOONGARCH_VDSO_DATA_PAGES - 1,
    VVAR_NR_PAGES,
};

So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have
already "jumped over" the time-ns vdso data.

OTOH it seems we are wasting a page if !CONFIG_TIME_NS.	Maybe:

enum vvar_pages {
    VVAR_GENERIC_PAGE_OFFSET,
#ifdef CONFIG_TIME_NS
    VVAR_TIMENS_PAGE_OFFSET,
#endif
    VVAR_LOONGARCH_PAGES_START,
    VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START +
LOONGARCH_VDSO_DATA_PAGES - 1,
    VVAR_NR_PAGES,
};

Tiezhu: how do you think?
Jason A. Donenfeld Aug. 29, 2024, 2:44 p.m. UTC | #8
On Thu, Aug 29, 2024 at 4:06 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Thu, 2024-08-29 at 15:37 +0200, Jason A. Donenfeld wrote:
> > On Thu, Aug 29, 2024 at 03:27:33PM +0200, Jason A. Donenfeld wrote:
> > > One small question just occurred to me:
> > >
> > > > +static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
> > > > + void)
> > > > +{
> > > > + return (const struct vdso_rng_data *)(
> > > > +         get_vdso_data() +
> > > > +         VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
> > > > +         offsetof(struct loongarch_vdso_data, rng_data));
> > > > +}
> > >
> > > Did you test this in a TIMENS? On x86, I had to deal with the page
> > > offsets switching around depending on whether there was a TIMENS. I
> > > tested this in my test harness with some basic code like:
> > >
> > >        if (argc == 1) {
> > >                if (unshare(CLONE_NEWTIME))
> > >                        panic("unshare(CLONE_NEWTIME)");
> > >                if (!fork()) {
> > >                        if (execl(argv[0], argv[0], "now-in-timens"))
> > >                                panic("execl");
> > >                }
> > >                wait(NULL);
> > >                poweroff();
> > >        }
> > >
> > > Because unlike other namespaces, the time one only becomes active after
> > > fork/exec.
> > >
> > > But maybe loongarch is more organized and you don't need any special
> > > handling in __arch_get_vdso...data() functions like I needed on x86.
> > > Just thought I should check.
> >
> > Normal results:
> >
> >    vdso: 25000000 times in 0.287330836 seconds
> >    libc: 25000000 times in 4.480710835 seconds
> > syscall: 25000000 times in 4.411098048 seconds
> >
> > After applying
> >
> > diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h
> > index ff5334ad32a0..5cb1b318ebe3 100644
> > --- a/arch/x86/include/asm/vdso/getrandom.h
> > +++ b/arch/x86/include/asm/vdso/getrandom.h
> > @@ -32,8 +32,6 @@ static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsig
> >
> >  static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void)
> >  {
> > -     if (IS_ENABLED(CONFIG_TIME_NS) && __vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS)
> > -             return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data);
> >       return &__vdso_rng_data;
> >  }
> >
> > the results are:
> >
> >    vdso: 25000000 times in 4.403789593 seconds
> >    libc: 25000000 times in 4.466771093 seconds
> > syscall: 25000000 times in 4.428145416 seconds
> >
> > The difference is that when it finds the shared data in the wrong place,
> > it thinks the RNG is uninitialized, so it always falls back to the
> > syscall, hence all three times being the same.
> >
> > If you're unsure how timens handling works on loongarch, try this test
> > yourself and see what you get.
>
> $ unshare -r -T --boottime $((365*24*3600))
> # uptime
>  21:54:36 up 365 days,  5:38,  0 user,  load average: 0.05, 0.08, 2.82
> # /home/xry111/git-repos/linux/tools/testing/selftests/vDSO/vdso_test_getrandom bench-single
>    vdso: 25000000 times in 0.499528591 seconds
>    libc: 25000000 times in 6.968980040 seconds
> syscall: 25000000 times in 6.987357071 seconds
>
> So it seems normal in a time ns.
>
> And from a comment in arch/loongarch/include/asm/vdso/vdso.h:
>
> /*
>  * The layout of vvar:
>  *
>  *                      high
>  * +---------------------+--------------------------+
>  * | loongarch vdso data | LOONGARCH_VDSO_DATA_SIZE |
>  * +---------------------+--------------------------+
>  * |  time-ns vdso data  |        PAGE_SIZE         |
>  * +---------------------+--------------------------+
>  * |  generic vdso data  |        PAGE_SIZE         |
>  * +---------------------+--------------------------+
>  *                      low
>  */
>
> And VVAR_LOONGARCH_PAGES_START is 2:
>
> enum vvar_pages {
>     VVAR_GENERIC_PAGE_OFFSET,
>     VVAR_TIMENS_PAGE_OFFSET,
>     VVAR_LOONGARCH_PAGES_START,
>     VVAR_LOONGARCH_PAGES_END = VVAR_LOONGARCH_PAGES_START +
> LOONGARCH_VDSO_DATA_PAGES - 1,
>     VVAR_NR_PAGES,
> };
>
> So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have
> already "jumped over" the time-ns vdso data.

Oh good. Thanks for checking. So it sounds like there's just Huacai's
set of comments and we're good.
Xi Ruoyao Aug. 30, 2024, 9:05 a.m. UTC | #9
On Thu, 2024-08-29 at 16:44 +0200, Jason A. Donenfeld wrote:
> > So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have
> > already "jumped over" the time-ns vdso data.
> 
> Oh good. Thanks for checking. So it sounds like there's just Huacai's
> set of comments and we're good.

Both Huacai and I (we've discussed a little off the list) think it seems
more natural to separate the implementation and the self test into two
patches.  Do you think it's acceptable?  If not we can live with one
consolidated patch though.
Jason A. Donenfeld Aug. 30, 2024, 12:24 p.m. UTC | #10
On Fri, Aug 30, 2024 at 05:05:38PM +0800, Xi Ruoyao wrote:
> On Thu, 2024-08-29 at 16:44 +0200, Jason A. Donenfeld wrote:
> > > So get_vdso_data() + VVAR_LOONGARCH_PAGES_START * PAGE_SIZE should have
> > > already "jumped over" the time-ns vdso data.
> > 
> > Oh good. Thanks for checking. So it sounds like there's just Huacai's
> > set of comments and we're good.
> 
> Both Huacai and I (we've discussed a little off the list) think it seems
> more natural to separate the implementation and the self test into two
> patches.  Do you think it's acceptable?  If not we can live with one
> consolidated patch though.

I'm not such a fan, as wiring up the test makes the whole thing more
"hermetic", but it also doesn't matter to me that much, so if you two
prefer that, that's fine with me and I'll commit it like that.
Xi Ruoyao Aug. 31, 2024, 6:40 a.m. UTC | #11
On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote:
> > -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
> > +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \
> > +              vgetrandom-chacha.o

Huacai: do you prefer to remove this line break as well, or Makefile
still has a line width limit?
Huacai Chen Aug. 31, 2024, 7:10 a.m. UTC | #12
On Sat, Aug 31, 2024 at 2:40 PM Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote:
> > > -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
> > > +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \
> > > +              vgetrandom-chacha.o
>
> Huacai: do you prefer to remove this line break as well, or Makefile
> still has a line width limit?
Also no limit, but Makefile is more or less different because there is
no "statement" and doesn't affect our understanding.

So, line break is fine here (but for my own preference I like to keep
sigreturn.o at last).

Huacai

>
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University
Christophe Leroy Aug. 31, 2024, 8:54 a.m. UTC | #13
Le 31/08/2024 à 09:10, Huacai Chen a écrit :
> [Vous ne recevez pas souvent de courriers de chenhuacai@kernel.org. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Sat, Aug 31, 2024 at 2:40 PM Xi Ruoyao <xry111@xry111.site> wrote:
>>
>> On Thu, 2024-08-29 at 21:18 +0800, Huacai Chen wrote:
>>>> -obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
>>>> +obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \
>>>> +              vgetrandom-chacha.o
>>
>> Huacai: do you prefer to remove this line break as well, or Makefile
>> still has a line width limit?
> Also no limit, but Makefile is more or less different because there is
> no "statement" and doesn't affect our understanding.
> 
> So, line break is fine here (but for my own preference I like to keep
> sigreturn.o at last).

To avoid line break you can leave the first line unmodified and add:

obj-vdso-y += vgetrandom.o vgetrandom-chacha.o

Christophe
diff mbox series

Patch

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 70f169210b52..14821c2aba5b 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -190,6 +190,7 @@  config LOONGARCH
 	select TRACE_IRQFLAGS_SUPPORT
 	select USE_PERCPU_NUMA_NODE_ID
 	select USER_STACKTRACE_SUPPORT
+	select VDSO_GETRANDOM
 	select ZONE_DMA32
 
 config 32BIT
diff --git a/arch/loongarch/include/asm/vdso/getrandom.h b/arch/loongarch/include/asm/vdso/getrandom.h
new file mode 100644
index 000000000000..04c991f6921d
--- /dev/null
+++ b/arch/loongarch/include/asm/vdso/getrandom.h
@@ -0,0 +1,44 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ */
+#ifndef __ASM_VDSO_GETRANDOM_H
+#define __ASM_VDSO_GETRANDOM_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/unistd.h>
+#include <asm/vdso/vdso.h>
+
+static __always_inline ssize_t getrandom_syscall(void *_buffer,
+						 size_t _len,
+						 unsigned int _flags)
+{
+	register long ret asm("a0");
+	register long nr asm("a7") = __NR_getrandom;
+	register void *buffer asm("a0") = _buffer;
+	register size_t len asm("a1") = _len;
+	register unsigned int flags asm("a2") = _flags;
+
+	asm volatile(
+	"      syscall 0\n"
+	: "+r" (ret)
+	: "r" (nr), "r" (buffer), "r" (len), "r" (flags)
+	: "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7", "$t8",
+	  "memory");
+
+	return ret;
+}
+
+static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(
+	void)
+{
+	return (const struct vdso_rng_data *)(
+		get_vdso_data() +
+		VVAR_LOONGARCH_PAGES_START * PAGE_SIZE +
+		offsetof(struct loongarch_vdso_data, rng_data));
+}
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_GETRANDOM_H */
diff --git a/arch/loongarch/include/asm/vdso/vdso.h b/arch/loongarch/include/asm/vdso/vdso.h
index 5a12309d9fb5..e31ac7474513 100644
--- a/arch/loongarch/include/asm/vdso/vdso.h
+++ b/arch/loongarch/include/asm/vdso/vdso.h
@@ -4,6 +4,9 @@ 
  * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
  */
 
+#ifndef _ASM_VDSO_VDSO_H
+#define _ASM_VDSO_VDSO_H
+
 #ifndef __ASSEMBLY__
 
 #include <asm/asm.h>
@@ -16,6 +19,7 @@  struct vdso_pcpu_data {
 
 struct loongarch_vdso_data {
 	struct vdso_pcpu_data pdata[NR_CPUS];
+	struct vdso_rng_data rng_data;
 };
 
 /*
@@ -63,3 +67,5 @@  static inline unsigned long get_vdso_data(void)
 }
 
 #endif /* __ASSEMBLY__ */
+
+#endif
diff --git a/arch/loongarch/include/asm/vdso/vsyscall.h b/arch/loongarch/include/asm/vdso/vsyscall.h
index 5de615383a22..b1273ce6f140 100644
--- a/arch/loongarch/include/asm/vdso/vsyscall.h
+++ b/arch/loongarch/include/asm/vdso/vsyscall.h
@@ -8,6 +8,7 @@ 
 #include <vdso/datapage.h>
 
 extern struct vdso_data *vdso_data;
+extern struct vdso_rng_data *vdso_rng_data;
 
 /*
  * Update the vDSO data page to keep in sync with kernel timekeeping.
@@ -19,6 +20,13 @@  struct vdso_data *__loongarch_get_k_vdso_data(void)
 }
 #define __arch_get_k_vdso_data __loongarch_get_k_vdso_data
 
+static __always_inline
+struct vdso_rng_data *__loongarch_get_k_vdso_rng_data(void)
+{
+	return vdso_rng_data;
+}
+#define __arch_get_k_vdso_rng_data __loongarch_get_k_vdso_rng_data
+
 /* The asm-generic header needs to be included after the definitions above */
 #include <asm-generic/vdso/vsyscall.h>
 
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index bee9f7a3108f..ab258878d551 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -14,6 +14,7 @@ 
 #include <asm/ptrace.h>
 #include <asm/processor.h>
 #include <asm/ftrace.h>
+#include <asm/vdso/vdso.h>
 
 static void __used output_ptreg_defines(void)
 {
diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
index 90dfccb41c14..2af05ba5f121 100644
--- a/arch/loongarch/kernel/vdso.c
+++ b/arch/loongarch/kernel/vdso.c
@@ -22,6 +22,7 @@ 
 #include <vdso/helpers.h>
 #include <vdso/vsyscall.h>
 #include <vdso/datapage.h>
+#include <generated/asm-offsets.h>
 #include <generated/vdso-offsets.h>
 
 extern char vdso_start[], vdso_end[];
@@ -37,6 +38,7 @@  static union {
 static struct page *vdso_pages[] = { NULL };
 struct vdso_data *vdso_data = generic_vdso_data.data;
 struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata;
+struct vdso_rng_data *vdso_rng_data = &loongarch_vdso_data.vdata.rng_data;
 
 static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma)
 {
diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile
index d724d46b07c8..19249d4b3542 100644
--- a/arch/loongarch/vdso/Makefile
+++ b/arch/loongarch/vdso/Makefile
@@ -4,7 +4,8 @@ 
 # Include the generic Makefile to check the built vdso.
 include $(srctree)/lib/vdso/Makefile
 
-obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o
+obj-vdso-y := elf.o vgetcpu.o vgettimeofday.o sigreturn.o vgetrandom.o \
+              vgetrandom-chacha.o
 
 # Common compiler flags between ABIs.
 ccflags-vdso := \
@@ -29,6 +30,10 @@  ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
 endif
 
+ifneq ($(c-getrandom-y),)
+  CFLAGS_vgetrandom.o += -include $(c-getrandom-y)
+endif
+
 # VDSO linker flags.
 ldflags-y := -Bsymbolic --no-undefined -soname=linux-vdso.so.1 \
 	$(filter -E%,$(KBUILD_CFLAGS)) -nostdlib -shared \
diff --git a/arch/loongarch/vdso/vdso.lds.S b/arch/loongarch/vdso/vdso.lds.S
index 56ad855896de..2c965a597d9e 100644
--- a/arch/loongarch/vdso/vdso.lds.S
+++ b/arch/loongarch/vdso/vdso.lds.S
@@ -63,6 +63,7 @@  VERSION
 		__vdso_clock_gettime;
 		__vdso_gettimeofday;
 		__vdso_rt_sigreturn;
+		__vdso_getrandom;
 	local: *;
 	};
 }
diff --git a/arch/loongarch/vdso/vgetrandom-chacha.S b/arch/loongarch/vdso/vgetrandom-chacha.S
new file mode 100644
index 000000000000..7e86a50f6e85
--- /dev/null
+++ b/arch/loongarch/vdso/vgetrandom-chacha.S
@@ -0,0 +1,242 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ */
+
+#include <asm/asm.h>
+#include <asm/regdef.h>
+#include <linux/linkage.h>
+
+.text
+
+/* Salsa20 quarter-round */
+.macro	QR	a b c d
+	add.w		\a, \a, \b
+	xor		\d, \d, \a
+	rotri.w		\d, \d, 16
+
+	add.w		\c, \c, \d
+	xor		\b, \b, \c
+	rotri.w		\b, \b, 20
+
+	add.w		\a, \a, \b
+	xor		\d, \d, \a
+	rotri.w		\d, \d, 24
+
+	add.w		\c, \c, \d
+	xor		\b, \b, \c
+	rotri.w		\b, \b, 25
+.endm
+
+/*
+ * Very basic LoongArch implementation of ChaCha20. Produces a given positive
+ * number of blocks of output with a nonce of 0, taking an input key and
+ * 8-byte counter. Importantly does not spill to the stack. Its arguments
+ * are:
+ *
+ *	a0: output bytes
+ *	a1: 32-byte key input
+ *	a2: 8-byte counter input/output
+ *	a3: number of 64-byte blocks to write to output
+ */
+SYM_FUNC_START(__arch_chacha20_blocks_nostack)
+
+/* We don't need a frame pointer */
+#define s9		fp
+
+#define output		a0
+#define key		a1
+#define counter		a2
+#define nblocks		a3
+#define i		a4
+#define state0		s0
+#define state1		s1
+#define state2		s2
+#define state3		s3
+#define state4		s4
+#define state5		s5
+#define state6		s6
+#define state7		s7
+#define state8		s8
+#define state9		s9
+#define state10		a5
+#define state11		a6
+#define state12		a7
+#define state13		t0
+#define state14		t1
+#define state15		t2
+#define cnt_lo		t3
+#define cnt_hi		t4
+#define copy0		t5
+#define copy1		t6
+#define copy2		t7
+
+/* Reuse i as copy3 */
+#define copy3		i
+
+	/*
+	 * The ABI requires s0-s9 saved, and sp aligned to 16-byte.
+	 * This does not violate the stack-less requirement: no sensitive data
+	 * is spilled onto the stack.
+	 */
+	PTR_ADDI	sp, sp, (-SZREG * 10) & STACK_ALIGN
+	REG_S		s0, sp, 0
+	REG_S		s1, sp, SZREG
+	REG_S		s2, sp, SZREG * 2
+	REG_S		s3, sp, SZREG * 3
+	REG_S		s4, sp, SZREG * 4
+	REG_S		s5, sp, SZREG * 5
+	REG_S		s6, sp, SZREG * 6
+	REG_S		s7, sp, SZREG * 7
+	REG_S		s8, sp, SZREG * 8
+	REG_S		s9, sp, SZREG * 9
+
+	li.w		copy0, 0x61707865
+	li.w		copy1, 0x3320646e
+	li.w		copy2, 0x79622d32
+
+	ld.w		cnt_lo, counter, 0
+	ld.w		cnt_hi, counter, 4
+
+.Lblock:
+	/* state[0,1,2,3] = "expand 32-byte k" */
+	move		state0, copy0
+	move		state1, copy1
+	move		state2, copy2
+	li.w		state3, 0x6b206574
+
+	/* state[4,5,..,11] = key */
+	ld.w		state4, key, 0
+	ld.w		state5, key, 4
+	ld.w		state6, key, 8
+	ld.w		state7, key, 12
+	ld.w		state8, key, 16
+	ld.w		state9, key, 20
+	ld.w		state10, key, 24
+	ld.w		state11, key, 28
+
+	/* state[12,13] = counter */
+	move		state12, cnt_lo
+	move		state13, cnt_hi
+
+	/* state[14,15] = 0 */
+	move		state14, zero
+	move		state15, zero
+
+	li.w		i, 10
+.Lpermute:
+	/* odd round */
+	QR		state0, state4, state8, state12
+	QR		state1, state5, state9, state13
+	QR		state2, state6, state10, state14
+	QR		state3, state7, state11, state15
+
+	/* even round */
+	QR		state0, state5, state10, state15
+	QR		state1, state6, state11, state12
+	QR		state2, state7, state8, state13
+	QR		state3, state4, state9, state14
+
+	addi.w		i, i, -1
+	bnez		i, .Lpermute
+
+	/*
+	 * copy[3] = "expa", materialize it here because copy[3] shares the
+	 * same register with i which just became dead.
+	 */
+	li.w		copy3, 0x6b206574
+
+	/* output[0,1,2,3] = copy[0,1,2,3] + state[0,1,2,3] */
+	add.w		state0, state0, copy0
+	add.w		state1, state1, copy1
+	add.w		state2, state2, copy2
+	add.w		state3, state3, copy3
+	st.w		state0, output, 0
+	st.w		state1, output, 4
+	st.w		state2, output, 8
+	st.w		state3, output, 12
+
+	/* from now on state[0,1,2,3] are scratch registers  */
+
+	/* state[0,1,2,3] = lo32(key) */
+	ld.w		state0, key, 0
+	ld.w		state1, key, 4
+	ld.w		state2, key, 8
+	ld.w		state3, key, 12
+
+	/* output[4,5,6,7] = state[0,1,2,3] + state[4,5,6,7] */
+	add.w		state4, state4, state0
+	add.w		state5, state5, state1
+	add.w		state6, state6, state2
+	add.w		state7, state7, state3
+	st.w		state4, output, 16
+	st.w		state5, output, 20
+	st.w		state6, output, 24
+	st.w		state7, output, 28
+
+	/* state[0,1,2,3] = hi32(key) */
+	ld.w		state0, key, 16
+	ld.w		state1, key, 20
+	ld.w		state2, key, 24
+	ld.w		state3, key, 28
+
+	/* output[8,9,10,11] = state[0,1,2,3] + state[8,9,10,11] */
+	add.w		state8, state8, state0
+	add.w		state9, state9, state1
+	add.w		state10, state10, state2
+	add.w		state11, state11, state3
+	st.w		state8, output, 32
+	st.w		state9, output, 36
+	st.w		state10, output, 40
+	st.w		state11, output, 44
+
+	/* output[12,13,14,15] = state[12,13,14,15] + [cnt_lo, cnt_hi, 0, 0] */
+	add.w		state12, state12, cnt_lo
+	add.w		state13, state13, cnt_hi
+	st.w		state12, output, 48
+	st.w		state13, output, 52
+	st.w		state14, output, 56
+	st.w		state15, output, 60
+
+	/* ++counter  */
+	addi.w		cnt_lo, cnt_lo, 1
+	sltui		state0, cnt_lo, 1
+	add.w		cnt_hi, cnt_hi, state0
+
+	/* output += 64 */
+	PTR_ADDI	output, output, 64
+	/* --nblocks */
+	PTR_ADDI	nblocks, nblocks, -1
+	bnez		nblocks, .Lblock
+
+	/* counter = [cnt_lo, cnt_hi] */
+	st.w		cnt_lo, counter, 0
+	st.w		cnt_hi, counter, 4
+
+	/*
+	 * Zero out the potentially sensitive regs, in case nothing uses these
+	 * again. As at now copy[0,1,2,3] just contains "expand 32-byte k" and
+	 * state[0,...,9] are s0-s9 those we'll restore in the epilogue, so we
+	 * only need to zero state[11,...,15].
+	 */
+	move		state10, zero
+	move		state11, zero
+	move		state12, zero
+	move		state13, zero
+	move		state14, zero
+	move		state15, zero
+
+	REG_L		s0, sp, 0
+	REG_L		s1, sp, SZREG
+	REG_L		s2, sp, SZREG * 2
+	REG_L		s3, sp, SZREG * 3
+	REG_L		s4, sp, SZREG * 4
+	REG_L		s5, sp, SZREG * 5
+	REG_L		s6, sp, SZREG * 6
+	REG_L		s7, sp, SZREG * 7
+	REG_L		s8, sp, SZREG * 8
+	REG_L		s9, sp, SZREG * 9
+	PTR_ADDI	sp, sp, -((-SZREG * 10) & STACK_ALIGN)
+
+	jr		ra
+SYM_FUNC_END(__arch_chacha20_blocks_nostack)
diff --git a/arch/loongarch/vdso/vgetrandom.c b/arch/loongarch/vdso/vgetrandom.c
new file mode 100644
index 000000000000..68e44b3f1d49
--- /dev/null
+++ b/arch/loongarch/vdso/vgetrandom.c
@@ -0,0 +1,15 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Xi Ruoyao <xry111@xry111.site>. All Rights Reserved.
+ */
+#include <linux/types.h>
+
+ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags,
+			 void *opaque_state, size_t opaque_len);
+
+ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags,
+			 void *opaque_state, size_t opaque_len)
+{
+	return __cvdso_getrandom(buffer, len, flags, opaque_state,
+				 opaque_len);
+}
diff --git a/tools/arch/loongarch/vdso b/tools/arch/loongarch/vdso
new file mode 120000
index 000000000000..ebda43a82db7
--- /dev/null
+++ b/tools/arch/loongarch/vdso
@@ -0,0 +1 @@ 
+../../../arch/loongarch/vdso
\ No newline at end of file
diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
index e21e78aae24d..606ce5f5c2a4 100644
--- a/tools/testing/selftests/vDSO/Makefile
+++ b/tools/testing/selftests/vDSO/Makefile
@@ -1,6 +1,6 @@ 
 # SPDX-License-Identifier: GPL-2.0
 uname_M := $(shell uname -m 2>/dev/null || echo not)
-ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
+ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/ -e /loongarch/s/[0-9]//g)
 
 TEST_GEN_PROGS := vdso_test_gettimeofday
 TEST_GEN_PROGS += vdso_test_getcpu
@@ -10,7 +10,7 @@  ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64))
 TEST_GEN_PROGS += vdso_standalone_test_x86
 endif
 TEST_GEN_PROGS += vdso_test_correctness
-ifeq ($(uname_M),x86_64)
+ifeq ($(uname_M),$(filter $(uname_M),x86_64 loongarch64))
 TEST_GEN_PROGS += vdso_test_getrandom
 TEST_GEN_PROGS += vdso_test_chacha
 endif