mbox series

[0/8] arm64: Optimise and update memcpy, user copy and string routines

Message ID cover.1571073960.git.robin.murphy@arm.com (mailing list archive)
Headers show
Series arm64: Optimise and update memcpy, user copy and string routines | expand

Message

Robin Murphy Oct. 15, 2019, 3:49 p.m. UTC
[ Since I've taken over this series just for the final upstream polish,
  I've left Sam's original cover letter below. Other than cosmetic
  cleanups, I ended up squashing the original first patch since it had
  become overwhelmingly redundant, and dropping the memset patch where
  we'd both initially managed to overlook the sneaky use of a Q register.

  Linaro have kindly given us permission to contribute Cortex Strings
  updates to Linux under GPLv2, as per their original submission.

  Robin. ]

This patch series optimises the arm64 memcpy, copy_to_user, copy_from_user,
copy_in_user, memcmp, memmove, memset, strcmp, strlen and strncmp routines by
importing the latest Cortex Strings implementation.

The first patch renames and reimplements the existing macros to use offset
addressing and adds postindex versions for existing code that relies on this
variant. The second patch imports the Cortex Strings implementation and removes
the uao_{stp, ldp}_post macros introduced in the first patch as they are no
longer needed. The final patch updates the fixup handlers so that they can
calculate the remaining number of bytes to be copied without using postindex
addressing.

When testing (detailed below) these changes give the following speedups:
  * copy_from_user: 13.17%
  * copy_to_user: 4.8%
  * memcpy: 27.88%
  * copy_in_user: Didn't appear in the test results.

Testing was done by booting a kernel with the changed implementation and
doing perf record on a defconfig kernel build from within a 3GB ramdisk.
Then perf report was run on the generated data and the number of samples
spent in each routine was noted. This same process was repeated for a build
on the latest master.

The fault handler was updated to provide the faulting address in x15 if the
fixup handler offset has its LSB set. The user memcpy routines then use this
behaviour by adding one to their fixup handler offsets. This behaviour is
similar to that in the sparc fault handler.

Robin Murphy (1):
  arm64: Tidy up _asm_extable_faultaddr usage

Sam Tebbs (7):
  arm64: Allow passing fault address to fixup handlers
  arm64: Import latest Cortex Strings memcpy implementation
  arm64: Import latest version of Cortex Strings' memcmp
  arm64: Import latest version of Cortex Strings' memmove
  arm64: Import latest version of Cortex Strings' strcmp
  arm64: Import latest version of Cortex Strings' strlen
  arm64: Import latest version of Cortex Strings' strncmp

 arch/arm64/include/asm/alternative.h |  36 ---
 arch/arm64/include/asm/assembler.h   |  13 +
 arch/arm64/include/asm/extable.h     |  10 +-
 arch/arm64/lib/copy_from_user.S      | 103 ++++++--
 arch/arm64/lib/copy_in_user.S        | 106 ++++++--
 arch/arm64/lib/copy_template.S       | 304 ++++++++++-----------
 arch/arm64/lib/copy_template_user.S  |  24 ++
 arch/arm64/lib/copy_to_user.S        | 102 +++++--
 arch/arm64/lib/copy_user_fixup.S     |  14 +
 arch/arm64/lib/memcmp.S              | 317 ++++++++--------------
 arch/arm64/lib/memcpy.S              |  48 ++--
 arch/arm64/lib/memmove.S             | 236 ++++++-----------
 arch/arm64/lib/strcmp.S              | 278 ++++++++------------
 arch/arm64/lib/strlen.S              | 249 ++++++++++++------
 arch/arm64/lib/strncmp.S             | 379 ++++++++++++---------------
 arch/arm64/mm/extable.c              |  13 +-
 arch/arm64/mm/fault.c                |   2 +-
 17 files changed, 1125 insertions(+), 1109 deletions(-)
 create mode 100644 arch/arm64/lib/copy_template_user.S
 create mode 100644 arch/arm64/lib/copy_user_fixup.S

Comments

Catalin Marinas Oct. 17, 2019, 11:21 a.m. UTC | #1
On Tue, Oct 15, 2019 at 04:49:55PM +0100, Robin Murphy wrote:
> Robin Murphy (1):
>   arm64: Tidy up _asm_extable_faultaddr usage
> 
> Sam Tebbs (7):
>   arm64: Allow passing fault address to fixup handlers
>   arm64: Import latest Cortex Strings memcpy implementation
>   arm64: Import latest version of Cortex Strings' memcmp
>   arm64: Import latest version of Cortex Strings' memmove
>   arm64: Import latest version of Cortex Strings' strcmp
>   arm64: Import latest version of Cortex Strings' strlen
>   arm64: Import latest version of Cortex Strings' strncmp

Thanks Robin. I merged these patches into the arm64
for-next/cortex-strings branch (and for-next/core) to give them some
exposure in -next. If nothing breaks, I'll push them into 5.5.
Catalin Marinas Oct. 18, 2019, 7:54 a.m. UTC | #2
Hi Robin,

On Tue, Oct 15, 2019 at 04:49:55PM +0100, Robin Murphy wrote:
> Robin Murphy (1):
>   arm64: Tidy up _asm_extable_faultaddr usage
> 
> Sam Tebbs (7):
>   arm64: Allow passing fault address to fixup handlers
>   arm64: Import latest Cortex Strings memcpy implementation
>   arm64: Import latest version of Cortex Strings' memcmp
>   arm64: Import latest version of Cortex Strings' memmove
>   arm64: Import latest version of Cortex Strings' strcmp
>   arm64: Import latest version of Cortex Strings' strlen
>   arm64: Import latest version of Cortex Strings' strncmp

Apart from the kprobes build failure (patch available already), I found
two more:

- with CONFIG_KASAN enabled:

arch/arm64/lib/memmove.o: in function `__pi_memmove':
arch/arm64/lib/memmove.S:57:(.text+0xc): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `memcpy' defined in .text section in mm/kasan/common.o

- big endian (I think kbuild robot also reported this):

arch/arm64/lib/strcmp.S: Assembler messages:
arch/arm64/lib/strcmp.S:118: Error: immediate value out of range 0 to 63 at operand 3 -- `lsr x2,x2,#560'

I'll drop the series for now (already removed it from for-next/core
yesterday) until the above are addressed.
Robin Murphy Oct. 18, 2019, 9:28 a.m. UTC | #3
On 2019-10-18 8:54 am, Catalin Marinas wrote:
> Hi Robin,
> 
> On Tue, Oct 15, 2019 at 04:49:55PM +0100, Robin Murphy wrote:
>> Robin Murphy (1):
>>    arm64: Tidy up _asm_extable_faultaddr usage
>>
>> Sam Tebbs (7):
>>    arm64: Allow passing fault address to fixup handlers
>>    arm64: Import latest Cortex Strings memcpy implementation
>>    arm64: Import latest version of Cortex Strings' memcmp
>>    arm64: Import latest version of Cortex Strings' memmove
>>    arm64: Import latest version of Cortex Strings' strcmp
>>    arm64: Import latest version of Cortex Strings' strlen
>>    arm64: Import latest version of Cortex Strings' strncmp
> 
> Apart from the kprobes build failure (patch available already), I found
> two more:
> 
> - with CONFIG_KASAN enabled:
> 
> arch/arm64/lib/memmove.o: in function `__pi_memmove':
> arch/arm64/lib/memmove.S:57:(.text+0xc): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `memcpy' defined in .text section in mm/kasan/common.o
> 
> - big endian (I think kbuild robot also reported this):
> 
> arch/arm64/lib/strcmp.S: Assembler messages:
> arch/arm64/lib/strcmp.S:118: Error: immediate value out of range 0 to 63 at operand 3 -- `lsr x2,x2,#560'
> 
> I'll drop the series for now (already removed it from for-next/core
> yesterday) until the above are addressed.

Thanks Catalin - I've already fixed the big-endian typo locally, and for 
the KASAN thing it seems we probably just overlooked an 
s/memcpy/__memcpy/ conversion, but I'll double-check before resending.

Robin.