[v4,1/2] RISC-V: Probe for unaligned access speed

Message ID	20230818194136.4084400-2-evan@rivosinc.com (mailing list archive)
State	Accepted
Commit	b98673c5b037b6b2fe0df68dd479f16267b7f26d
Headers	show Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org> From: Evan Green <evan@rivosinc.com> To: Palmer Dabbelt <palmer@rivosinc.com> Subject: [PATCH v4 1/2] RISC-V: Probe for unaligned access speed Date: Fri, 18 Aug 2023 12:41:35 -0700 Message-Id: <20230818194136.4084400-2-evan@rivosinc.com> In-Reply-To: <20230818194136.4084400-1-evan@rivosinc.com> References: <20230818194136.4084400-1-evan@rivosinc.com> MIME-Version: 1.0 Precedence: list Cc: Heiko Stuebner <heiko@sntech.de>, linux-doc@vger.kernel.org, =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@rivosinc.com>, Conor Dooley <conor.dooley@microchip.com>, Guo Ren <guoren@kernel.org>, Evan Green <evan@rivosinc.com>, Jisheng Zhang <jszhang@kernel.org>, linux-riscv@lists.infradead.org, Jonathan Corbet <corbet@lwn.net>, Sia Jee Heng <jeeheng.sia@starfivetech.com>, Marc Zyngier <maz@kernel.org>, Masahiro Yamada <masahiroy@kernel.org>, Greentime Hu <greentime.hu@sifive.com>, Simon Hosie <shosie@rivosinc.com>, Andrew Jones <ajones@ventanamicro.com>, Albert Ou <aou@eecs.berkeley.edu>, Alexandre Ghiti <alexghiti@rivosinc.com>, Ley Foon Tan <leyfoon.tan@starfivetech.com>, Paul Walmsley <paul.walmsley@sifive.com>, Anup Patel <apatel@ventanamicro.com>, linux-kernel@vger.kernel.org, Xianting Tian <xianting.tian@linux.alibaba.com>, David Laight <David.Laight@aculab.com>, Palmer Dabbelt <palmer@dabbelt.com>, Andy Chiu <andy.chiu@sifive.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	RISC-V: Probe for misaligned access speed \| expand [v4,0/2] RISC-V: Probe for misaligned access speed [v4,1/2] RISC-V: Probe for unaligned access speed [v4,2/2] RISC-V: alternative: Remove feature_probe_func

Context	Check	Description
conchuod/cover_letter	success	Series has a cover letter
conchuod/tree_selection	success	Guessed tree name to be for-next at HEAD 174e8ac0272d
conchuod/fixes_present	success	Fixes tag not required for -next series
conchuod/maintainers_pattern	success	MAINTAINERS pattern errors before the patch: 4 and now 4
conchuod/verify_signedoff	success	Signed-off-by tag matches author and committer
conchuod/kdoc	success	Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig	success	Errors and warnings before: 9 this patch: 9
conchuod/module_param	success	Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig	success	Errors and warnings before: 10 this patch: 10
conchuod/build_rv32_defconfig	success	Build OK
conchuod/dtb_warn_rv64	success	Errors and warnings before: 12 this patch: 12
conchuod/header_inline	success	No static functions without inline keyword in header files
conchuod/checkpatch	warning	CHECK: Consider using #include <linux/cpufeature.h> instead of <asm/cpufeature.h> WARNING: 'THead' may be misspelled - perhaps 'Thread'? WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: memory barrier without comment
conchuod/build_rv64_nommu_k210_defconfig	success	Build OK
conchuod/verify_fixes	success	No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig	success	Build OK

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst index 19165ebd82ba..f63fd05f1a73 100644 --- a/Documentation/riscv/hwprobe.rst +++ b/Documentation/riscv/hwprobe.rst @@ -87,13 +87,12 @@ The following keys are defined: emulated via software, either in or below the kernel. These accesses are always extremely slow. - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported - in hardware, but are slower than the cooresponding aligned accesses - sequences. + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are slower + than equivalent byte accesses. Misaligned accesses may be supported + directly in hardware, or trapped and emulated by software. - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported - in hardware and are faster than the cooresponding aligned accesses - sequences. + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are faster + than equivalent byte accesses. * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned accesses are not supported at all and will generate a misaligned address fault. diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h index 23fed53b8815..d0345bd659c9 100644 --- a/arch/riscv/include/asm/cpufeature.h +++ b/arch/riscv/include/asm/cpufeature.h @@ -30,4 +30,6 @@ DECLARE_PER_CPU(long, misaligned_access_speed); /* Per-cpu ISA extensions. */ extern struct riscv_isainfo hart_isa[NR_CPUS]; +void check_unaligned_access(int cpu); + #endif diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 506cc4a9a45a..7e6c464cdfe9 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -38,6 +38,7 @@ extra-y += vmlinux.lds obj-y += head.o obj-y += soc.o obj-$(CONFIG_RISCV_ALTERNATIVE) += alternative.o +obj-y += copy-unaligned.o obj-y += cpu.o obj-y += cpufeature.o obj-y += entry.o diff --git a/arch/riscv/kernel/copy-unaligned.S b/arch/riscv/kernel/copy-unaligned.S new file mode 100644 index 000000000000..cfdecfbaad62 --- /dev/null +++ b/arch/riscv/kernel/copy-unaligned.S @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2023 Rivos Inc. */ + +#include <linux/linkage.h> +#include <asm/asm.h> + + .text + +/* void __riscv_copy_words_unaligned(void *, const void *, size_t) */ +/* Performs a memcpy without aligning buffers, using word loads and stores. */ +/* Note: The size is truncated to a multiple of 8 * SZREG */ +ENTRY(__riscv_copy_words_unaligned) + andi a4, a2, ~((8*SZREG)-1) + beqz a4, 2f + add a3, a1, a4 +1: + REG_L a4, 0(a1) + REG_L a5, SZREG(a1) + REG_L a6, 2*SZREG(a1) + REG_L a7, 3*SZREG(a1) + REG_L t0, 4*SZREG(a1) + REG_L t1, 5*SZREG(a1) + REG_L t2, 6*SZREG(a1) + REG_L t3, 7*SZREG(a1) + REG_S a4, 0(a0) + REG_S a5, SZREG(a0) + REG_S a6, 2*SZREG(a0) + REG_S a7, 3*SZREG(a0) + REG_S t0, 4*SZREG(a0) + REG_S t1, 5*SZREG(a0) + REG_S t2, 6*SZREG(a0) + REG_S t3, 7*SZREG(a0) + addi a0, a0, 8*SZREG + addi a1, a1, 8*SZREG + bltu a1, a3, 1b + +2: + ret +END(__riscv_copy_words_unaligned) + +/* void __riscv_copy_bytes_unaligned(void *, const void *, size_t) */ +/* Performs a memcpy without aligning buffers, using only byte accesses. */ +/* Note: The size is truncated to a multiple of 8 */ +ENTRY(__riscv_copy_bytes_unaligned) + andi a4, a2, ~(8-1) + beqz a4, 2f + add a3, a1, a4 +1: + lb a4, 0(a1) + lb a5, 1(a1) + lb a6, 2(a1) + lb a7, 3(a1) + lb t0, 4(a1) + lb t1, 5(a1) + lb t2, 6(a1) + lb t3, 7(a1) + sb a4, 0(a0) + sb a5, 1(a0) + sb a6, 2(a0) + sb a7, 3(a0) + sb t0, 4(a0) + sb t1, 5(a0) + sb t2, 6(a0) + sb t3, 7(a0) + addi a0, a0, 8 + addi a1, a1, 8 + bltu a1, a3, 1b + +2: + ret +END(__riscv_copy_bytes_unaligned) diff --git a/arch/riscv/kernel/copy-unaligned.h b/arch/riscv/kernel/copy-unaligned.h new file mode 100644 index 000000000000..e3d70d35b708 --- /dev/null +++ b/arch/riscv/kernel/copy-unaligned.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023 Rivos, Inc. + */ +#ifndef __RISCV_KERNEL_COPY_UNALIGNED_H +#define __RISCV_KERNEL_COPY_UNALIGNED_H + +#include <linux/types.h> + +void __riscv_copy_words_unaligned(void *dst, const void *src, size_t size); +void __riscv_copy_bytes_unaligned(void *dst, const void *src, size_t size); + +#endif /* __RISCV_KERNEL_COPY_UNALIGNED_H */ diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 71fb840ee246..72bbaf355067 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -19,12 +19,19 @@ #include <asm/cacheflush.h> #include <asm/cpufeature.h> #include <asm/hwcap.h> +#include <asm/hwprobe.h> #include <asm/patch.h> #include <asm/processor.h> #include <asm/vector.h> +#include "copy-unaligned.h" + #define NUM_ALPHA_EXTS ('z' - 'a' + 1) +#define MISALIGNED_ACCESS_JIFFIES_LG2 1 +#define MISALIGNED_BUFFER_SIZE 0x4000 +#define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80) + unsigned long elf_hwcap __read_mostly; /* Host ISA bitmap */ @@ -555,6 +562,103 @@ unsigned long riscv_get_elf_hwcap(void) return hwcap; } +void check_unaligned_access(int cpu) +{ + u64 start_cycles, end_cycles; + u64 word_cycles; + u64 byte_cycles; + int ratio; + unsigned long start_jiffies, now; + struct page *page; + void *dst; + void *src; + long speed = RISCV_HWPROBE_MISALIGNED_SLOW; + + page = alloc_pages(GFP_NOWAIT, get_order(MISALIGNED_BUFFER_SIZE)); + if (!page) { + pr_warn("Can't alloc pages to measure memcpy performance"); + return; + } + + /* Make an unaligned destination buffer. */ + dst = (void *)((unsigned long)page_address(page) | 0x1); + /* Unalign src as well, but differently (off by 1 + 2 = 3). */ + src = dst + (MISALIGNED_BUFFER_SIZE / 2); + src += 2; + word_cycles = -1ULL; + /* Do a warmup. */ + __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); + preempt_disable(); + start_jiffies = jiffies; + while ((now = jiffies) == start_jiffies) + cpu_relax(); + + /* + * For a fixed amount of time, repeatedly try the function, and take + * the best time in cycles as the measurement. + */ + while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { + start_cycles = get_cycles64(); + /* Ensure the CSR read can't reorder WRT to the copy. */ + mb(); + __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); + /* Ensure the copy ends before the end time is snapped. */ + mb(); + end_cycles = get_cycles64(); + if ((end_cycles - start_cycles) < word_cycles) + word_cycles = end_cycles - start_cycles; + } + + byte_cycles = -1ULL; + __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); + start_jiffies = jiffies; + while ((now = jiffies) == start_jiffies) + cpu_relax(); + + while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { + start_cycles = get_cycles64(); + mb(); + __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); + mb(); + end_cycles = get_cycles64(); + if ((end_cycles - start_cycles) < byte_cycles) + byte_cycles = end_cycles - start_cycles; + } + + preempt_enable(); + + /* Don't divide by zero. */ + if (!word_cycles || !byte_cycles) { + pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n", + cpu); + + goto out; + } + + if (word_cycles < byte_cycles) + speed = RISCV_HWPROBE_MISALIGNED_FAST; + + ratio = div_u64((byte_cycles * 100), word_cycles); + pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n", + cpu, + ratio / 100, + ratio % 100, + (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); + + per_cpu(misaligned_access_speed, cpu) = speed; + +out: + __free_pages(page, get_order(MISALIGNED_BUFFER_SIZE)); +} + +static int check_unaligned_access_boot_cpu(void) +{ + check_unaligned_access(0); + return 0; +} + +arch_initcall(check_unaligned_access_boot_cpu); + #ifdef CONFIG_RISCV_ALTERNATIVE /* * Alternative patch sites consider 48 bits when determining when to patch diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index f4d6acb38dd0..00ddbd2364dc 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -26,6 +26,7 @@ #include <linux/sched/task_stack.h> #include <linux/sched/mm.h> #include <asm/cpu_ops.h> +#include <asm/cpufeature.h> #include <asm/irq.h> #include <asm/mmu_context.h> #include <asm/numa.h> @@ -245,6 +246,7 @@ asmlinkage __visible void smp_callin(void) numa_add_cpu(curr_cpuid); set_cpu_online(curr_cpuid, 1); + check_unaligned_access(curr_cpuid); probe_vendor_features(curr_cpuid); if (has_vector()) {

[v4,1/2] RISC-V: Probe for unaligned access speed

Checks

Commit Message

Comments

Patch