mbox series

[v5,0/5] Wire up getrandom() vDSO implementation on powerpc

Message ID cover.1725304404.git.christophe.leroy@csgroup.eu (mailing list archive)
Headers show
Series Wire up getrandom() vDSO implementation on powerpc | expand

Message

Christophe Leroy Sept. 2, 2024, 7:17 p.m. UTC
This series wires up getrandom() vDSO implementation on powerpc.

Tested on PPC32 on real hardware.
Tested on PPC64 (both BE and LE) on QEMU:

Performance on powerpc 885:
	~# ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 62.938002291 seconds
	   libc: 25000000 times in 535.581916866 seconds
	syscall: 25000000 times in 531.525042806 seconds

Performance on powerpc 8321:
	~# ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 16.899318858 seconds
	   libc: 25000000 times in 131.050596522 seconds
	syscall: 25000000 times in 129.794790389 seconds

Performance on QEMU pseries:
	~ # ./vdso_test_getrandom bench-single
	   vdso: 25000000 times in 4.977777162 seconds
	   libc: 25000000 times in 75.516749981 seconds
	syscall: 25000000 times in 86.842242014 seconds

Changes in v5:
- The split between last two patches is not anymore PPC32/PPC64 but VDSO32/VDSO64
- Removed the stub returning ENOSYS
- Using meaningfull names for registers
- Restored symbolic link that disappeared in v4

Changes in v4:
- Rebased on recent random git tree (963233ff0133) (The new tree includes selftests fixes)
- Read/write counter in native byte order
- Don't use anymore compat macros to write output
- Fixed selftests build failure with patch 4 (without patch 5) on little endian on PPC64
- Implement a __kernel_getrandom() stub returning ENOSYS on ppc64 in patch 4 (without patch 5) to make selftests happy.

Changes in v3:
- Rebased on recent random git tree (0c7e00e22c21)
- Fixed build failures reported by robots around VM_DROPPABLE
- Fixed crash on PPC64 due to clobbered r13 by not using r13 anymore (saving it was not enough for signals).
- Split final patch in two, first for PPC32, second for PPC64
- Moved selftest fixes out of this series

Changes in v2:
- Define VM_DROPPABLE for powerpc/32
- Fixes generic vDSO getrandom headers to enable CONFIG_COMPAT build.
- Fixed size of generation counter
- Fixed selftests to work on non x86 architectures

Christophe Leroy (5):
  mm: Define VM_DROPPABLE for powerpc/32
  powerpc/vdso32: Add crtsavres
  powerpc/vdso: Refactor CFLAGS for CVDSO build
  powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
  powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64

 arch/powerpc/Kconfig                         |   1 +
 arch/powerpc/include/asm/mman.h              |   2 +-
 arch/powerpc/include/asm/vdso/getrandom.h    |  54 +++
 arch/powerpc/include/asm/vdso/vsyscall.h     |   6 +
 arch/powerpc/include/asm/vdso_datapage.h     |   2 +
 arch/powerpc/kernel/asm-offsets.c            |   1 +
 arch/powerpc/kernel/vdso/Makefile            |  57 +--
 arch/powerpc/kernel/vdso/getrandom.S         |  58 +++
 arch/powerpc/kernel/vdso/gettimeofday.S      |  13 -
 arch/powerpc/kernel/vdso/vdso32.lds.S        |   1 +
 arch/powerpc/kernel/vdso/vdso64.lds.S        |   1 +
 arch/powerpc/kernel/vdso/vgetrandom-chacha.S | 365 +++++++++++++++++++
 arch/powerpc/kernel/vdso/vgetrandom.c        |  14 +
 fs/proc/task_mmu.c                           |   4 +-
 include/linux/mm.h                           |   4 +-
 include/trace/events/mmflags.h               |   4 +-
 tools/arch/powerpc/vdso                      |   1 +
 tools/testing/selftests/vDSO/Makefile        |   2 +-
 18 files changed, 547 insertions(+), 43 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/getrandom.h
 create mode 100644 arch/powerpc/kernel/vdso/getrandom.S
 create mode 100644 arch/powerpc/kernel/vdso/vgetrandom-chacha.S
 create mode 100644 arch/powerpc/kernel/vdso/vgetrandom.c
 create mode 120000 tools/arch/powerpc/vdso

Comments

Jason A. Donenfeld Sept. 4, 2024, 2:16 p.m. UTC | #1
Hi Christophe, Michael,

On Mon, Sep 02, 2024 at 09:17:17PM +0200, Christophe Leroy wrote:
> This series wires up getrandom() vDSO implementation on powerpc.
> 
> Tested on PPC32 on real hardware.
> Tested on PPC64 (both BE and LE) on QEMU:
> 
> Performance on powerpc 885:
> 	~# ./vdso_test_getrandom bench-single
> 	   vdso: 25000000 times in 62.938002291 seconds
> 	   libc: 25000000 times in 535.581916866 seconds
> 	syscall: 25000000 times in 531.525042806 seconds
> 
> Performance on powerpc 8321:
> 	~# ./vdso_test_getrandom bench-single
> 	   vdso: 25000000 times in 16.899318858 seconds
> 	   libc: 25000000 times in 131.050596522 seconds
> 	syscall: 25000000 times in 129.794790389 seconds
> 
> Performance on QEMU pseries:
> 	~ # ./vdso_test_getrandom bench-single
> 	   vdso: 25000000 times in 4.977777162 seconds
> 	   libc: 25000000 times in 75.516749981 seconds
> 	syscall: 25000000 times in 86.842242014 seconds

Looking good. I have no remaining nits on this patchset; it looks good
to me.

A review from Michael would be nice though (in addition to the necessary
"Ack" I need to commit this to my tree), because there are a lot of PPC
particulars that I don't know enough about to review properly. For
example, you use -ffixed-r30 on PPC64. I'm sure there's a good reason
for this, but I don't know enough to assess it. And cvdso_call I have no
idea what's going on. Etc.

But anyway, awesome work, and I look forward to the final stretches.

Jason
Christophe Leroy Sept. 4, 2024, 2:36 p.m. UTC | #2
Le 04/09/2024 à 16:16, Jason A. Donenfeld a écrit :
> Hi Christophe, Michael,
> 
> On Mon, Sep 02, 2024 at 09:17:17PM +0200, Christophe Leroy wrote:
>> This series wires up getrandom() vDSO implementation on powerpc.
>>
>> Tested on PPC32 on real hardware.
>> Tested on PPC64 (both BE and LE) on QEMU:
>>
>> Performance on powerpc 885:
>> 	~# ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 62.938002291 seconds
>> 	   libc: 25000000 times in 535.581916866 seconds
>> 	syscall: 25000000 times in 531.525042806 seconds
>>
>> Performance on powerpc 8321:
>> 	~# ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 16.899318858 seconds
>> 	   libc: 25000000 times in 131.050596522 seconds
>> 	syscall: 25000000 times in 129.794790389 seconds
>>
>> Performance on QEMU pseries:
>> 	~ # ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 4.977777162 seconds
>> 	   libc: 25000000 times in 75.516749981 seconds
>> 	syscall: 25000000 times in 86.842242014 seconds
> 
> Looking good. I have no remaining nits on this patchset; it looks good
> to me.
> 
> A review from Michael would be nice though (in addition to the necessary
> "Ack" I need to commit this to my tree), because there are a lot of PPC
> particulars that I don't know enough about to review properly. For
> example, you use -ffixed-r30 on PPC64. I'm sure there's a good reason
> for this, but I don't know enough to assess it. And cvdso_call I have no
> idea what's going on. Etc.

You can learn a bit more about cvdso_call in commit ce7d8056e38b 
("powerpc/vdso: Prepare for switching VDSO to generic C implementation.")

About the fixed-r30, you can learn more in commit a88603f4b92e 
("powerpc/vdso: Don't use r30 to avoid breaking Go lang")


> 
> But anyway, awesome work, and I look forward to the final stretches.

Thanks, looking forward to getting this series applied.

Christophe
Michael Ellerman Sept. 5, 2024, 12:18 p.m. UTC | #3
"Jason A. Donenfeld" <Jason@zx2c4.com> writes:
> Hi Christophe, Michael,
>
> On Mon, Sep 02, 2024 at 09:17:17PM +0200, Christophe Leroy wrote:
>> This series wires up getrandom() vDSO implementation on powerpc.
>> 
>> Tested on PPC32 on real hardware.
>> Tested on PPC64 (both BE and LE) on QEMU:
>> 
>> Performance on powerpc 885:
>> 	~# ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 62.938002291 seconds
>> 	   libc: 25000000 times in 535.581916866 seconds
>> 	syscall: 25000000 times in 531.525042806 seconds
>> 
>> Performance on powerpc 8321:
>> 	~# ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 16.899318858 seconds
>> 	   libc: 25000000 times in 131.050596522 seconds
>> 	syscall: 25000000 times in 129.794790389 seconds
>> 
>> Performance on QEMU pseries:
>> 	~ # ./vdso_test_getrandom bench-single
>> 	   vdso: 25000000 times in 4.977777162 seconds
>> 	   libc: 25000000 times in 75.516749981 seconds
>> 	syscall: 25000000 times in 86.842242014 seconds
>
> Looking good. I have no remaining nits on this patchset; it looks good
> to me.
>
> A review from Michael would be nice though (in addition to the necessary
> "Ack" I need to commit this to my tree), because there are a lot of PPC
> particulars that I don't know enough about to review properly. For
> example, you use -ffixed-r30 on PPC64. I'm sure there's a good reason
> for this, but I don't know enough to assess it. And cvdso_call I have no
> idea what's going on. Etc.
 
It all looks good to me, and has survived some testing. Let's get it
merged and get some wider test coverage.

There is an existing comment in the a/p/vdso/Makefile about the
fixed-r30 thing, tldr is it's a workaround to avoid breaking old
versions of Go.

For the series:

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

If you can include Maddy's test results from Power9 in the change log
for patch 5 that'd be nice.

cheers
Jason A. Donenfeld Sept. 5, 2024, 12:56 p.m. UTC | #4
On Thu, Sep 05, 2024 at 10:18:40PM +1000, Michael Ellerman wrote:
> There is an existing comment in the a/p/vdso/Makefile about the
> fixed-r30 thing, tldr is it's a workaround to avoid breaking old
> versions of Go.

Thanks. Indeed, following Christophe's links yesterday, I tumbled down
that rabbit hole for a bit. Interesting how ABIs ossify unintentionally.


> For the series:
> 
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

Excellent, queued up now.

> If you can include Maddy's test results from Power9 in the change log
> for patch 5 that'd be nice.

Was my plan exactly. I replaced the QEMU result with the PowerNV one.

Jason