diff mbox series

[v8,3/3] x86: vdso: Wire up getrandom() vDSO implementation

Message ID 20221128111829.2477505-4-Jason@zx2c4.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series implement getrandom() in vDSO | expand

Commit Message

Jason A. Donenfeld Nov. 28, 2022, 11:18 a.m. UTC
Hook up the generic vDSO implementation to the x86 vDSO data page. Since
the existing vDSO infrastructure is heavily based on the timekeeping
functionality, which works over arrays of bases, a new macro is
introduced for vvars that are not arrays.

Also enable the vgetrandom_alloc() syscall, which the vDSO
implementation relies on.

The vDSO function requires a ChaCha20 implementation that does not write
to the stack, yet can still do an entire ChaCha20 permutation, so
provide this using SSE2, since this is userland code that must work on
all x86-64 processors.

Reviewed-by: Samuel Neves <sneves@dei.uc.pt> # for vgetrandom-chacha.S
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 arch/x86/Kconfig                        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
 arch/x86/entry/vdso/Makefile            |   3 +-
 arch/x86/entry/vdso/vdso.lds.S          |   2 +
 arch/x86/entry/vdso/vgetrandom-chacha.S | 177 ++++++++++++++++++++++++
 arch/x86/entry/vdso/vgetrandom.c        |  17 +++
 arch/x86/include/asm/unistd.h           |   1 +
 arch/x86/include/asm/vdso/getrandom.h   |  55 ++++++++
 arch/x86/include/asm/vdso/vsyscall.h    |   2 +
 arch/x86/include/asm/vvar.h             |  16 +++
 10 files changed, 274 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/entry/vdso/vgetrandom-chacha.S
 create mode 100644 arch/x86/entry/vdso/vgetrandom.c
 create mode 100644 arch/x86/include/asm/vdso/getrandom.h

Comments

Arnd Bergmann Nov. 28, 2022, 7:18 p.m. UTC | #1
On Mon, Nov 28, 2022, at 12:18, Jason A. Donenfeld wrote:
> Hook up the generic vDSO implementation to the x86 vDSO data page. Since
> the existing vDSO infrastructure is heavily based on the timekeeping
> functionality, which works over arrays of bases, a new macro is
> introduced for vvars that are not arrays.
>
> Also enable the vgetrandom_alloc() syscall, which the vDSO
> implementation relies on.
>
> The vDSO function requires a ChaCha20 implementation that does not write
> to the stack, yet can still do an entire ChaCha20 permutation, so
> provide this using SSE2, since this is userland code that must work on
> all x86-64 processors.
>
> Reviewed-by: Samuel Neves <sneves@dei.uc.pt> # for vgetrandom-chacha.S
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  arch/x86/Kconfig                        |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl  |   1 +

I see that this enables the syscall in x86-64, while patch 1
adds it to the eight architecures that use 
include/uapi/asm-generic/unistd.h (with the __ARCH_WANT_*
guard at the moment, but you already said that will be removed)

I think ideally the syscall.tbl and unistd.h changes should be done
in one patch for all architectures that doesn't mix it with
any other changes. In particular I think it should be separate
from the vdso changes, but could be in the patch that implements
the syscall.

      Arnd
Jason A. Donenfeld Nov. 28, 2022, 7:23 p.m. UTC | #2
Hi Arnd,

On Mon, Nov 28, 2022 at 08:18:12PM +0100, Arnd Bergmann wrote:
> On Mon, Nov 28, 2022, at 12:18, Jason A. Donenfeld wrote:
> > Hook up the generic vDSO implementation to the x86 vDSO data page. Since
> > the existing vDSO infrastructure is heavily based on the timekeeping
> > functionality, which works over arrays of bases, a new macro is
> > introduced for vvars that are not arrays.
> >
> > Also enable the vgetrandom_alloc() syscall, which the vDSO
> > implementation relies on.
> >
> > The vDSO function requires a ChaCha20 implementation that does not write
> > to the stack, yet can still do an entire ChaCha20 permutation, so
> > provide this using SSE2, since this is userland code that must work on
> > all x86-64 processors.
> >
> > Reviewed-by: Samuel Neves <sneves@dei.uc.pt> # for vgetrandom-chacha.S
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > ---
> >  arch/x86/Kconfig                        |   1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl  |   1 +
> 
> I see that this enables the syscall in x86-64, while patch 1
> adds it to the eight architecures that use 
> include/uapi/asm-generic/unistd.h (with the __ARCH_WANT_*
> guard at the moment, but you already said that will be removed)
> 
> I think ideally the syscall.tbl and unistd.h changes should be done
> in one patch for all architectures that doesn't mix it with
> any other changes. In particular I think it should be separate
> from the vdso changes, but could be in the patch that implements
> the syscall.

That's more or less how v7 was, but Thomas thought the x86 stuff should
be separate. So for v8, the organization is:

1) generic syscall
2) generic vdso
3) x86 wiring

The primary advantage is that future archs wanting to add this now can
just look at commit (3) only, and make a similar commit for that new
arch.

If you think a different organization outweighs that advantage, can you
spell out what division of patches you want, and I'll do that for v9?
Or maybe this v8 is okay?

Jason
Arnd Bergmann Nov. 28, 2022, 7:57 p.m. UTC | #3
On Mon, Nov 28, 2022, at 20:23, Jason A. Donenfeld wrote:
> On Mon, Nov 28, 2022 at 08:18:12PM +0100, Arnd Bergmann wrote:
>> On Mon, Nov 28, 2022, at 12:18, Jason A. Donenfeld wrote:
>
> That's more or less how v7 was, but Thomas thought the x86 stuff should
> be separate. So for v8, the organization is:
>
> 1) generic syscall
> 2) generic vdso
> 3) x86 wiring
>
> The primary advantage is that future archs wanting to add this now can
> just look at commit (3) only, and make a similar commit for that new
> arch.
>
> If you think a different organization outweighs that advantage, can you
> spell out what division of patches you want, and I'll do that for v9?
> Or maybe this v8 is okay?

My interest is that at the end of the series, all architectures
are hooked up with the same syscall number, which avoids confusion
and merge conflicts when we add the next syscall to all tables.

How about one patch to add all the syscall table entries, and then
have the x86 specific change just turn on the Kconfig symbol that
actually enables the syscall?

     Arnd
Jason A. Donenfeld Nov. 28, 2022, 8:02 p.m. UTC | #4
Hi Arnd,

On Mon, Nov 28, 2022 at 8:57 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Nov 28, 2022, at 20:23, Jason A. Donenfeld wrote:
> > On Mon, Nov 28, 2022 at 08:18:12PM +0100, Arnd Bergmann wrote:
> >> On Mon, Nov 28, 2022, at 12:18, Jason A. Donenfeld wrote:
> >
> > That's more or less how v7 was, but Thomas thought the x86 stuff should
> > be separate. So for v8, the organization is:
> >
> > 1) generic syscall
> > 2) generic vdso
> > 3) x86 wiring
> >
> > The primary advantage is that future archs wanting to add this now can
> > just look at commit (3) only, and make a similar commit for that new
> > arch.
> >
> > If you think a different organization outweighs that advantage, can you
> > spell out what division of patches you want, and I'll do that for v9?
> > Or maybe this v8 is okay?
>
> My interest is that at the end of the series, all architectures
> are hooked up with the same syscall number, which avoids confusion
> and merge conflicts when we add the next syscall to all tables.
>
> How about one patch to add all the syscall table entries, and then
> have the x86 specific change just turn on the Kconfig symbol that
> actually enables the syscall?

Okay, I can split it that way. If I gather your meaning correctly:

1) generic syscall C code
2) #define __NR_... in asm-generic/unistd.h x86/.../unistd.h,
x86/.../syscall_64.tbl
3) generic vdso C code
4) hook up x86 vdso, and select the right Kconfig symbol to start
compiling the code

Is that what you have in mind? If so, I'll name (2) "arch: wire up
vgetrandom_alloc() syscall number".

Jason
Jason A. Donenfeld Nov. 28, 2022, 8:41 p.m. UTC | #5
Hey again,

On Mon, Nov 28, 2022 at 9:02 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Arnd,
>
> On Mon, Nov 28, 2022 at 8:57 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Mon, Nov 28, 2022, at 20:23, Jason A. Donenfeld wrote:
> > > On Mon, Nov 28, 2022 at 08:18:12PM +0100, Arnd Bergmann wrote:
> > >> On Mon, Nov 28, 2022, at 12:18, Jason A. Donenfeld wrote:
> > >
> > > That's more or less how v7 was, but Thomas thought the x86 stuff should
> > > be separate. So for v8, the organization is:
> > >
> > > 1) generic syscall
> > > 2) generic vdso
> > > 3) x86 wiring
> > >
> > > The primary advantage is that future archs wanting to add this now can
> > > just look at commit (3) only, and make a similar commit for that new
> > > arch.
> > >
> > > If you think a different organization outweighs that advantage, can you
> > > spell out what division of patches you want, and I'll do that for v9?
> > > Or maybe this v8 is okay?
> >
> > My interest is that at the end of the series, all architectures
> > are hooked up with the same syscall number, which avoids confusion
> > and merge conflicts when we add the next syscall to all tables.
> >
> > How about one patch to add all the syscall table entries, and then
> > have the x86 specific change just turn on the Kconfig symbol that
> > actually enables the syscall?
>
> Okay, I can split it that way. If I gather your meaning correctly:
>
> 1) generic syscall C code
> 2) #define __NR_... in asm-generic/unistd.h x86/.../unistd.h,
> x86/.../syscall_64.tbl
> 3) generic vdso C code
> 4) hook up x86 vdso, and select the right Kconfig symbol to start
> compiling the code
>
> Is that what you have in mind? If so, I'll name (2) "arch: wire up
> vgetrandom_alloc() syscall number".

Well, I just did this, and it seems clean enough. The result is in:
https://git.zx2c4.com/linux-rng/log/?h=vdso
if you're curious to poke at it ahead of v9.

Jason
Arnd Bergmann Nov. 28, 2022, 9:12 p.m. UTC | #6
On Mon, Nov 28, 2022, at 21:02, Jason A. Donenfeld wrote:
> On Mon, Nov 28, 2022 at 8:57 PM Arnd Bergmann <arnd@arndb.de> wrote:

> Okay, I can split it that way. If I gather your meaning correctly:
>
> 1) generic syscall C code
> 2) #define __NR_... in asm-generic/unistd.h x86/.../unistd.h,
> x86/.../syscall_64.tbl

I mean all syscall*.tbl files, not just x86. There are currently
eight architectures using asm-generic/unistd.h, the rest use
syscall.tbl.

> 3) generic vdso C code
> 4) hook up x86 vdso, and select the right Kconfig symbol to start
> compiling the code
>
> Is that what you have in mind? If so, I'll name (2) "arch: wire up
> vgetrandom_alloc() syscall number".

That sounds good, yes.

      Arnd
Jason A. Donenfeld Nov. 28, 2022, 9:29 p.m. UTC | #7
On Mon, Nov 28, 2022 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Nov 28, 2022, at 21:02, Jason A. Donenfeld wrote:
> > On Mon, Nov 28, 2022 at 8:57 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> > Okay, I can split it that way. If I gather your meaning correctly:
> >
> > 1) generic syscall C code
> > 2) #define __NR_... in asm-generic/unistd.h x86/.../unistd.h,
> > x86/.../syscall_64.tbl
>
> I mean all syscall*.tbl files, not just x86. There are currently
> eight architectures using asm-generic/unistd.h, the rest use
> syscall.tbl.

Oh okay, I'll add this to all of the *.tbl files.

Jason
Jason A. Donenfeld Nov. 28, 2022, 9:39 p.m. UTC | #8
On Mon, Nov 28, 2022 at 10:29:29PM +0100, Jason A. Donenfeld wrote:
> On Mon, Nov 28, 2022 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Mon, Nov 28, 2022, at 21:02, Jason A. Donenfeld wrote:
> > > On Mon, Nov 28, 2022 at 8:57 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > > Okay, I can split it that way. If I gather your meaning correctly:
> > >
> > > 1) generic syscall C code
> > > 2) #define __NR_... in asm-generic/unistd.h x86/.../unistd.h,
> > > x86/.../syscall_64.tbl
> >
> > I mean all syscall*.tbl files, not just x86. There are currently
> > eight architectures using asm-generic/unistd.h, the rest use
> > syscall.tbl.
> 
> Oh okay, I'll add this to all of the *.tbl files.

Alright, so, it's looking like this now:

commit 16751c0ac4efaf1cefd793a79c469f9d62ddb3ed
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Mon Nov 28 21:37:14 2022

    arch: allocate vgetrandom_alloc() syscall number

    Add vgetrandom_alloc() as syscall 451 (or 561 on alpha) by adding it to
    all of the various syscall.tbl an unistd.h files.

    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

 arch/alpha/kernel/syscalls/syscall.tbl              | 1 +
 arch/arm/tools/syscall.tbl                          | 1 +
 arch/arm64/include/asm/unistd32.h                   | 2 ++
 arch/ia64/kernel/syscalls/syscall.tbl               | 1 +
 arch/m68k/kernel/syscalls/syscall.tbl               | 1 +
 arch/microblaze/kernel/syscalls/syscall.tbl         | 1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl           | 1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl           | 1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl           | 1 +
 arch/parisc/kernel/syscalls/syscall.tbl             | 1 +
 arch/powerpc/kernel/syscalls/syscall.tbl            | 1 +
 arch/s390/kernel/syscalls/syscall.tbl               | 1 +
 arch/sh/kernel/syscalls/syscall.tbl                 | 1 +
 arch/sparc/kernel/syscalls/syscall.tbl              | 1 +
 arch/x86/entry/syscalls/syscall_32.tbl              | 1 +
 arch/x86/entry/syscalls/syscall_64.tbl              | 1 +
 arch/xtensa/kernel/syscalls/syscall.tbl             | 1 +
 include/uapi/asm-generic/unistd.h                   | 5 ++++-
 tools/include/uapi/asm-generic/unistd.h             | 5 ++++-
 tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl | 1 +
 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl  | 1 +
 tools/perf/arch/s390/entry/syscalls/syscall.tbl     | 1 +
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl   | 1 +
 23 files changed, 30 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..357148c4a3a4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -269,6 +269,7 @@  config X86
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_GENERIC_VDSO
+	select VDSO_GETRANDOM			if X86_64
 	select HOTPLUG_SMT			if SMP
 	select IRQ_FORCED_THREADING
 	select NEED_PER_CPU_EMBED_FIRST_CHUNK
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index c84d12608cd2..0186f173f0e8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -372,6 +372,7 @@ 
 448	common	process_mrelease	sys_process_mrelease
 449	common	futex_waitv		sys_futex_waitv
 450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
+451	common	vgetrandom_alloc	sys_vgetrandom_alloc
 
 #
 # Due to a historical design error, certain syscalls are numbered differently
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 3e88b9df8c8f..2de64e52236a 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -27,7 +27,7 @@  VDSO32-$(CONFIG_X86_32)		:= y
 VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 
 # files to link into the vdso
-vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o vgetrandom.o vgetrandom-chacha.o
 vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
 vobjs32-y += vdso32/vclock_gettime.o
 vobjs-$(CONFIG_X86_SGX)	+= vsgx.o
@@ -104,6 +104,7 @@  CFLAGS_REMOVE_vclock_gettime.o = -pg
 CFLAGS_REMOVE_vdso32/vclock_gettime.o = -pg
 CFLAGS_REMOVE_vgetcpu.o = -pg
 CFLAGS_REMOVE_vsgx.o = -pg
+CFLAGS_REMOVE_vgetrandom.o = -pg
 
 #
 # X32 processes use x32 vDSO to access 64bit kernel data.
diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
index 4bf48462fca7..1919cc39277e 100644
--- a/arch/x86/entry/vdso/vdso.lds.S
+++ b/arch/x86/entry/vdso/vdso.lds.S
@@ -28,6 +28,8 @@  VERSION {
 		clock_getres;
 		__vdso_clock_getres;
 		__vdso_sgx_enter_enclave;
+		getrandom;
+		__vdso_getrandom;
 	local: *;
 	};
 }
diff --git a/arch/x86/entry/vdso/vgetrandom-chacha.S b/arch/x86/entry/vdso/vgetrandom-chacha.S
new file mode 100644
index 000000000000..91fbb7ac7af4
--- /dev/null
+++ b/arch/x86/entry/vdso/vgetrandom-chacha.S
@@ -0,0 +1,177 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+.section	.rodata.cst16.CONSTANTS, "aM", @progbits, 16
+.align 16
+CONSTANTS:	.octa 0x6b20657479622d323320646e61707865
+.text
+
+/*
+ * Very basic SSE2 implementation of ChaCha20. Produces a given positive number
+ * of blocks of output with a nonce of 0, taking an input key and 8-byte
+ * counter. Importantly does not spill to the stack. Its arguments are:
+ *
+ *	rdi: output bytes
+ *	rsi: 32-byte key input
+ *	rdx: 8-byte counter input/output
+ *	rcx: number of 64-byte blocks to write to output
+ */
+SYM_FUNC_START(__arch_chacha20_blocks_nostack)
+
+#define output  %rdi
+#define key     %rsi
+#define counter %rdx
+#define nblocks %rcx
+#define i       %al
+#define state0  %xmm0
+#define state1  %xmm1
+#define state2  %xmm2
+#define state3  %xmm3
+#define copy0   %xmm4
+#define copy1   %xmm5
+#define copy2   %xmm6
+#define copy3   %xmm7
+#define temp    %xmm8
+#define one     %xmm9
+
+	/* copy0 = "expand 32-byte k" */
+	movaps		CONSTANTS(%rip),copy0
+	/* copy1,copy2 = key */
+	movups		0x00(key),copy1
+	movups		0x10(key),copy2
+	/* copy3 = counter || zero nonce */
+	movq		0x00(counter),copy3
+	/* one = 1 || 0 */
+	movq		$1,%rax
+	movq		%rax,one
+
+.Lblock:
+	/* state0,state1,state2,state3 = copy0,copy1,copy2,copy3 */
+	movdqa		copy0,state0
+	movdqa		copy1,state1
+	movdqa		copy2,state2
+	movdqa		copy3,state3
+
+	movb		$10,i
+.Lpermute:
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */
+	paddd		state1,state0
+	pxor		state0,state3
+	movdqa		state3,temp
+	pslld		$16,temp
+	psrld		$16,state3
+	por		temp,state3
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */
+	paddd		state3,state2
+	pxor		state2,state1
+	movdqa		state1,temp
+	pslld		$12,temp
+	psrld		$20,state1
+	por		temp,state1
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */
+	paddd		state1,state0
+	pxor		state0,state3
+	movdqa		state3,temp
+	pslld		$8,temp
+	psrld		$24,state3
+	por		temp,state3
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */
+	paddd		state3,state2
+	pxor		state2,state1
+	movdqa		state1,temp
+	pslld		$7,temp
+	psrld		$25,state1
+	por		temp,state1
+
+	/* state1[0,1,2,3] = state1[0,3,2,1] */
+	pshufd		$0x39,state1,state1
+	/* state2[0,1,2,3] = state2[1,0,3,2] */
+	pshufd		$0x4e,state2,state2
+	/* state3[0,1,2,3] = state3[2,1,0,3] */
+	pshufd		$0x93,state3,state3
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 16) */
+	paddd		state1,state0
+	pxor		state0,state3
+	movdqa		state3,temp
+	pslld		$16,temp
+	psrld		$16,state3
+	por		temp,state3
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 12) */
+	paddd		state3,state2
+	pxor		state2,state1
+	movdqa		state1,temp
+	pslld		$12,temp
+	psrld		$20,state1
+	por		temp,state1
+
+	/* state0 += state1, state3 = rotl32(state3 ^ state0, 8) */
+	paddd		state1,state0
+	pxor		state0,state3
+	movdqa		state3,temp
+	pslld		$8,temp
+	psrld		$24,state3
+	por		temp,state3
+
+	/* state2 += state3, state1 = rotl32(state1 ^ state2, 7) */
+	paddd		state3,state2
+	pxor		state2,state1
+	movdqa		state1,temp
+	pslld		$7,temp
+	psrld		$25,state1
+	por		temp,state1
+
+	/* state1[0,1,2,3] = state1[2,1,0,3] */
+	pshufd		$0x93,state1,state1
+	/* state2[0,1,2,3] = state2[1,0,3,2] */
+	pshufd		$0x4e,state2,state2
+	/* state3[0,1,2,3] = state3[0,3,2,1] */
+	pshufd		$0x39,state3,state3
+
+	decb		i
+	jnz		.Lpermute
+
+	/* output0 = state0 + copy0 */
+	paddd		copy0,state0
+	movups		state0,0x00(output)
+	/* output1 = state1 + copy1 */
+	paddd		copy1,state1
+	movups		state1,0x10(output)
+	/* output2 = state2 + copy2 */
+	paddd		copy2,state2
+	movups		state2,0x20(output)
+	/* output3 = state3 + copy3 */
+	paddd		copy3,state3
+	movups		state3,0x30(output)
+
+	/* ++copy3.counter */
+	paddq		one,copy3
+
+	/* output += 64, --nblocks */
+	addq		$64,output
+	decq		nblocks
+	jnz		.Lblock
+
+	/* counter = copy3.counter */
+	movq		copy3,0x00(counter)
+
+	/* Zero out the potentially sensitive regs, in case nothing uses these again. */
+	pxor		state0,state0
+	pxor		state1,state1
+	pxor		state2,state2
+	pxor		state3,state3
+	pxor		copy1,copy1
+	pxor		copy2,copy2
+	pxor		temp,temp
+
+	ret
+SYM_FUNC_END(__arch_chacha20_blocks_nostack)
diff --git a/arch/x86/entry/vdso/vgetrandom.c b/arch/x86/entry/vdso/vgetrandom.c
new file mode 100644
index 000000000000..6045ded5da90
--- /dev/null
+++ b/arch/x86/entry/vdso/vgetrandom.c
@@ -0,0 +1,17 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+#include <linux/types.h>
+
+#include "../../../../lib/vdso/getrandom.c"
+
+ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state);
+
+ssize_t __vdso_getrandom(void *buffer, size_t len, unsigned int flags, void *state)
+{
+	return __cvdso_getrandom(buffer, len, flags, state);
+}
+
+ssize_t getrandom(void *, size_t, unsigned int, void *)
+	__attribute__((weak, alias("__vdso_getrandom")));
diff --git a/arch/x86/include/asm/unistd.h b/arch/x86/include/asm/unistd.h
index 761173ccc33c..1bf509eaeff1 100644
--- a/arch/x86/include/asm/unistd.h
+++ b/arch/x86/include/asm/unistd.h
@@ -27,6 +27,7 @@ 
 #  define __ARCH_WANT_COMPAT_SYS_PWRITEV64
 #  define __ARCH_WANT_COMPAT_SYS_PREADV64V2
 #  define __ARCH_WANT_COMPAT_SYS_PWRITEV64V2
+#  define __ARCH_WANT_VGETRANDOM_ALLOC
 #  define X32_NR_syscalls (__NR_x32_syscalls)
 #  define IA32_NR_syscalls (__NR_ia32_syscalls)
 
diff --git a/arch/x86/include/asm/vdso/getrandom.h b/arch/x86/include/asm/vdso/getrandom.h
new file mode 100644
index 000000000000..a2bb2dc4443e
--- /dev/null
+++ b/arch/x86/include/asm/vdso/getrandom.h
@@ -0,0 +1,55 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+#ifndef __ASM_VDSO_GETRANDOM_H
+#define __ASM_VDSO_GETRANDOM_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/unistd.h>
+#include <asm/vvar.h>
+
+/**
+ * getrandom_syscall - invoke the getrandom() syscall
+ * @buffer:	destination buffer to fill with random bytes
+ * @len:	size of @buffer in bytes
+ * @flags:	zero or more GRND_* flags
+ * Returns the number of random bytes written to @buffer, or a negative value indicating an error.
+ */
+static __always_inline ssize_t getrandom_syscall(void *buffer, size_t len, unsigned int flags)
+{
+	long ret;
+
+	asm ("syscall" : "=a" (ret) :
+	     "0" (__NR_getrandom), "D" (buffer), "S" (len), "d" (flags) :
+	     "rcx", "r11", "memory");
+
+	return ret;
+}
+
+#define __vdso_rng_data (VVAR(_vdso_rng_data))
+
+static __always_inline const struct vdso_rng_data *__arch_get_vdso_rng_data(void)
+{
+	if (__vdso_data->clock_mode == VDSO_CLOCKMODE_TIMENS)
+		return (void *)&__vdso_rng_data + ((void *)&__timens_vdso_data - (void *)&__vdso_data);
+	return &__vdso_rng_data;
+}
+
+/**
+ * __arch_chacha20_blocks_nostack - generate ChaCha20 stream without using the stack
+ * @dst_bytes:	a destination buffer to hold @nblocks * 64 bytes of output
+ * @key:	32-byte input key
+ * @counter:	8-byte counter, read on input and updated on return
+ * @nblocks:	the number of blocks to generate
+ *
+ * Generates a given positive number of block of ChaCha20 output with nonce=0, and does not write to
+ * any stack or memory outside of the parameters passed to it. This way, there's no concern about
+ * stack data leaking into forked child processes.
+ */
+extern void __arch_chacha20_blocks_nostack(u8 *dst_bytes, const u32 *key, u32 *counter, size_t nblocks);
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_GETRANDOM_H */
diff --git a/arch/x86/include/asm/vdso/vsyscall.h b/arch/x86/include/asm/vdso/vsyscall.h
index be199a9b2676..71c56586a22f 100644
--- a/arch/x86/include/asm/vdso/vsyscall.h
+++ b/arch/x86/include/asm/vdso/vsyscall.h
@@ -11,6 +11,8 @@ 
 #include <asm/vvar.h>
 
 DEFINE_VVAR(struct vdso_data, _vdso_data);
+DEFINE_VVAR_SINGLE(struct vdso_rng_data, _vdso_rng_data);
+
 /*
  * Update the vDSO data page to keep in sync with kernel timekeeping.
  */
diff --git a/arch/x86/include/asm/vvar.h b/arch/x86/include/asm/vvar.h
index 183e98e49ab9..9d9af37f7cab 100644
--- a/arch/x86/include/asm/vvar.h
+++ b/arch/x86/include/asm/vvar.h
@@ -26,6 +26,8 @@ 
  */
 #define DECLARE_VVAR(offset, type, name) \
 	EMIT_VVAR(name, offset)
+#define DECLARE_VVAR_SINGLE(offset, type, name) \
+	EMIT_VVAR(name, offset)
 
 #else
 
@@ -37,6 +39,10 @@  extern char __vvar_page;
 	extern type timens_ ## name[CS_BASES]				\
 	__attribute__((visibility("hidden")));				\
 
+#define DECLARE_VVAR_SINGLE(offset, type, name)				\
+	extern type vvar_ ## name					\
+	__attribute__((visibility("hidden")));				\
+
 #define VVAR(name) (vvar_ ## name)
 #define TIMENS(name) (timens_ ## name)
 
@@ -44,12 +50,22 @@  extern char __vvar_page;
 	type name[CS_BASES]						\
 	__attribute__((section(".vvar_" #name), aligned(16))) __visible
 
+#define DEFINE_VVAR_SINGLE(type, name)					\
+	type name							\
+	__attribute__((section(".vvar_" #name), aligned(16))) __visible
+
 #endif
 
 /* DECLARE_VVAR(offset, type, name) */
 
 DECLARE_VVAR(128, struct vdso_data, _vdso_data)
 
+#if !defined(_SINGLE_DATA)
+#define _SINGLE_DATA
+DECLARE_VVAR_SINGLE(640, struct vdso_rng_data, _vdso_rng_data)
+#endif
+
 #undef DECLARE_VVAR
+#undef DECLARE_VVAR_SINGLE
 
 #endif