From patchwork Sat Aug 24 00:49:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yunhui cui X-Patchwork-Id: 13776151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4550C5321E for ; Sat, 24 Aug 2024 00:49:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1264800D8; Fri, 23 Aug 2024 20:49:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C9B86800D4; Fri, 23 Aug 2024 20:49:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B14B3800D8; Fri, 23 Aug 2024 20:49:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8DC5A800D4 for ; Fri, 23 Aug 2024 20:49:37 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1633312026B for ; Sat, 24 Aug 2024 00:49:37 +0000 (UTC) X-FDA: 82485305994.12.50863C6 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf05.hostedemail.com (Postfix) with ESMTP id 7113B100015 for ; Sat, 24 Aug 2024 00:49:34 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=eNHCSS0X; spf=pass (imf05.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724460492; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=V9LfDeeWKMLmiNvxF44yCZAImDGYCScd/uarouyejBQ=; b=uojxNofrImfaDkHj7etPa5d28uQ/yFxmBNKUEcIreDS8nkBmOS1ufMB9fkDAD6+bTdRnWw N78iA+dLaUcp35u+X7I8zbJhe8gMxvglnEqSfL+DcyD0pn+L3V4Uo1Zf04zXTWcpaYYxbD SP7nP4HetXDQDXIjezyZHTcB6PFamNo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724460492; a=rsa-sha256; cv=none; b=eLyNkuh6I7s/ZWVGfSg90jj6YAZSJsbOyNrsFFhivWcACWOopwHFqK79GZB7tKKhDiIq9u FCCo4XMoaQhQ3M5EgahYG15m1fa0y+6/D4qxTdwq/8KkA5JPF41iTC7ZlgBrw28H7QAQ4C ya8MFH5MO6zPtGx+lgNwquckJqqoFiU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=eNHCSS0X; spf=pass (imf05.hostedemail.com: domain of cuiyunhui@bytedance.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=cuiyunhui@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7093abb12edso2375974a34.3 for ; Fri, 23 Aug 2024 17:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1724460573; x=1725065373; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=V9LfDeeWKMLmiNvxF44yCZAImDGYCScd/uarouyejBQ=; b=eNHCSS0XDFmdyd8niJWlYBQc4XG793O0U0aR0yQDUocgb/oedm8R60AgCqyqmd8Oq1 1AUV3KhvUdmcar2qNNmLe02ECr4lODBk01mL3qbXocXBkLnVwxzkbTu/D86geQy5+G1e DcnVfHsYI+5w7cFin4aV3EiUnKBW+fYewPz2s5rn7TTKktGCEZSjwqB+eLjM0gVl5FGr HSQZhyZXCyVeDKIGPFHkOLn6Q3OVqZu0Tkzm1iZLdAW+vTVhWg8fHeBcJf3y1remn7Wf dLf9cM6mepQN260wCx9xo5iX3557zvv6x/pLPO+RzAYKU0umA7r/3n3gPQknPihN402F RckA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724460573; x=1725065373; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V9LfDeeWKMLmiNvxF44yCZAImDGYCScd/uarouyejBQ=; b=WoKYdkZkkCWHSmzGrXRxeOIoQPNSX+bmf/NTezdEav7JfPGHa5DGPRidjiqHZih1jC UxIQkK+zygQ5VVzI4aeO9sZzaQbkhEP9HK92UTF++GUQJWUgvTsMUX62t0+VpF1yr0zQ /b1DcRQCkSP2N7TSuTzQoHylawBIuOUixbdn9oaJk+607U+03AypbPOnrluZ9dEb8Ndx BvGe7QkLdQNk+XKCseSMCaDstqTL3sz5P9B26CA+iSeY5R+OiIbkeUJotxu5vK/uCigc 9/OCcwoHQ9RryGNTJ1pQ1JRXPor4AD1qFhRVo5Rcim+sNtu8Plrl8xJ14/ZbGhwh3Cr2 UCgg== X-Forwarded-Encrypted: i=1; AJvYcCWnbs7faHFoU+yGBuHwwieuxKMRYRZmKHVONd6fxSzlA9rHveugzNFZoYjOUmPict8L2QSBIW0ohg==@kvack.org X-Gm-Message-State: AOJu0YwYCr0ODWG+eolysDm6u11UHonPVWaihZDOK0KYMokRTRP1NQfY OOp9bNG6J8468+E0MuzKmWwxKRFcq8AuwmKb9zKaxxnHOfcoZSmy1gXHJqt7Jao= X-Google-Smtp-Source: AGHT+IHXN44/Uhpbc5HjU/UNIngFnsErN7xy0wmHCyVmt+yEtdYziYd3Gj1rFnHMzEVOqHQmQWL9yQ== X-Received: by 2002:a05:6830:6004:b0:70a:9876:b76b with SMTP id 46e09a7af769-70e0ead89c3mr5064366a34.2.1724460573141; Fri, 23 Aug 2024 17:49:33 -0700 (PDT) Received: from L6YN4KR4K9.bytedance.net ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2d5ebbb0b27sm7088692a91.45.2024.08.23.17.49.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 23 Aug 2024 17:49:32 -0700 (PDT) From: Yunhui Cui To: punit.agrawal@bytedance.com, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, dennis@kernel.org, tj@kernel.org, cl@linux.com, samitolvanen@google.com, guoren@kernel.org, debug@rivosinc.com, charlie@rivosinc.com, cuiyunhui@bytedance.com, cleger@rivosinc.com, puranjay@kernel.org, antonb@tenstorrent.com, namcaov@gmail.com, andy.chiu@sifive.com, ajones@ventanamicro.com, samuel.holland@sifive.com, haxel@fzi.de, yang.zhang@hexintek.com, conor.dooley@microchip.com, evan@rivosinc.com, yang.lee@linux.alibaba.com, tglx@linutronix.de, haibo1.xu@intel.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC] riscv: use gp to save percpu offset Date: Sat, 24 Aug 2024 08:49:20 +0800 Message-Id: <20240824004920.35877-1-cuiyunhui@bytedance.com> X-Mailer: git-send-email 2.39.2 (Apple Git-143) MIME-Version: 1.0 X-Stat-Signature: x4bp1mi8ncd9sxqwpz9ezr6tupomkofi X-Rspamd-Queue-Id: 7113B100015 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1724460574-851116 X-HE-Meta: U2FsdGVkX18A7PZDeBr/mpZnlvjt3xmbtpzhzpT6GTfby9RG+eKXD5C0a7Q/hTVIvNxQVN2aI2mxqIhwwMQrMR8titbfLBUQkDk8+hVNLPAyly1sZducLozPRQHbKmkKBwy+3aQTvq2q184yuuoXukiAiqwte0u8ZtL3bs92WFShz7DD33DNzK7rQloqHBxcpVKhGlp/eIIe3miyfsKWYfc9sBXKocWcvTLe5QI09Sj0HwTXjpBkHFFL7A63BqrGYlOXH02T0GtIrnpmh6eW3i1ygO108WeuaPHkKhBOovjP4DrV4nFTkiTC1CuUgvTUujRujcML6B3ijBDpwHWa8RfX1UrVoQ3gk3Tkjm2H+lzql7ARB4ZHIUWdzGAq+dOnGg9dD56ztqgXX4w0R0c8uIjUV5Y7d7PbR7kH5QrW7Y8Xf7vYtArjofR9OJ5bbsYFsglSbOgGQrZLEjLHZIKxB0ibbrISt8kU4PGVn/+0qeLO0j5umO9zl509uV13ODDYiafhBwiwuTC0UWc0tMEp2Mfysf0Rg7LKmr6Z51GxF+Y5vMeSiHX1E682LNnQZVRsU1PjYyJ929lCJa2k3wOCQBLk8Xp1VQTx5ZtwlBkw5a27MIWjxItIvBEVZ2Zk8SjoNgPtB7R38g8oHy/1rGxWII3+Y/FRWK5m7tXM37CqkkTpCP4eIcf02071T3jhvavdqlvLBySH29kOK1rLlWebNX4gucM3m21M5kolMqj69d/VXt1nDL62GIIMT8BgIvIRfHIRa4vLhLC6DdCysag6QTKBP70tKBJl+XY+Pb2JQX97cqa8AHyKqRZSn/LNsW2wqgNhPxcNLuTOu0LeEOsIqtBDqbLXIMA8zbVD0gtBs/eIHqPZN/ktvTSBgiwR6r6y3+wypt3BitnnPfyQGcuZq8qqYVNLPM3axlyAe031DDQ1BmG51JVioWC/pxI34uVM4QF3OdiyO28iwM+lbFH jxOtrfDi PpjVNLzGhilh16LU6ZcmPVyB6I/idbQ5fLZjzu99JXt2oxObubVHD5+IxIWH+gZuu7hGvYT5Q+y4aKSUpLrsmcu1nSIxceQxyW5h3nl52qsRPuc9ikLXYTrke8kXjPBHayQw2iMEl2uuOXVAadVYOivESKPfsezPTVkTuE7Fa22IzEuDzaWE3KM3usL75uAa2WC6s3q8IMSKpX6q6rbpiA2sxJ9shoRruHvO3mdZwLDJ97pa9veNiRDqz5lepAgOarG8nCaZZbv6NGLnvcbeit2Pk5D5TWaAeRF5B6OrSNoy2adUxgiI41HuNq5h2IkiRiFGa X-Bogosity: Ham, tests=bogofilter, spamicity=0.000020, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Compared to directly fetching the per-CPU offset from memory (or cache), using the global pointer (gp) to store the per-CPU offset can save one memory access. When compiling the kernel, the following command needs to be explicitly specified: export KCFLAGS="... -mno-relax" export KAFLAGS="... -mno-relax" Signed-off-by: Yunhui Cui --- arch/riscv/include/asm/asm.h | 18 ++++++------------ arch/riscv/include/asm/percpu.h | 24 ++++++++++++++++++++++++ arch/riscv/kernel/asm-offsets.c | 1 + arch/riscv/kernel/entry.S | 4 ++-- arch/riscv/kernel/head.S | 9 --------- arch/riscv/kernel/smpboot.c | 7 +++++++ arch/riscv/kernel/suspend_entry.S | 2 -- 7 files changed, 40 insertions(+), 25 deletions(-) create mode 100644 arch/riscv/include/asm/percpu.h diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h index 776354895b81..be4e4e5ac134 100644 --- a/arch/riscv/include/asm/asm.h +++ b/arch/riscv/include/asm/asm.h @@ -109,19 +109,13 @@ REG_L \dst, 0(\dst) .endm -#ifdef CONFIG_SHADOW_CALL_STACK -/* gp is used as the shadow call stack pointer instead */ -.macro load_global_pointer +.macro load_pcpu_off_gp tmp + REG_L \tmp, TASK_TI_CPU(tp) + slli \tmp, \tmp, 3 + la gp, __per_cpu_offset + add gp, gp, \tmp + REG_L gp, 0(gp) .endm -#else -/* load __global_pointer to gp */ -.macro load_global_pointer -.option push -.option norelax - la gp, __global_pointer$ -.option pop -.endm -#endif /* CONFIG_SHADOW_CALL_STACK */ /* save all GPs except x1 ~ x5 */ .macro save_from_x6_to_x31 diff --git a/arch/riscv/include/asm/percpu.h b/arch/riscv/include/asm/percpu.h new file mode 100644 index 000000000000..858d0a93ff14 --- /dev/null +++ b/arch/riscv/include/asm/percpu.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __ASM_PERCPU_H +#define __ASM_PERCPU_H + +static inline void set_my_cpu_offset(unsigned long off) +{ + asm volatile("addi gp, %0, 0" :: "r" (off)); +} + +static inline unsigned long __kern_my_cpu_offset(void) +{ + unsigned long off; + + asm ("mv %0, gp":"=r" (off) :); + return off; +} + +#define __my_cpu_offset __kern_my_cpu_offset() + +#include + +#endif /* __ASM_PERCPU_H */ + diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c index b09ca5f944f7..5cc6d1de4ab4 100644 --- a/arch/riscv/kernel/asm-offsets.c +++ b/arch/riscv/kernel/asm-offsets.c @@ -36,6 +36,7 @@ void asm_offsets(void) OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]); OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]); OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]); + OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu); OFFSET(TASK_TI_FLAGS, task_struct, thread_info.flags); OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count); OFFSET(TASK_TI_KERNEL_SP, task_struct, thread_info.kernel_sp); diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index ac2e908d4418..39d7e66567cf 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -77,8 +77,8 @@ SYM_CODE_START(handle_exception) */ csrw CSR_SCRATCH, x0 - /* Load the global pointer */ - load_global_pointer + /* load __per_cpu_offset[cpu] to gp*/ + load_pcpu_off_gp t6 /* Load the kernel shadow call stack pointer if coming from userspace */ scs_load_current_if_task_changed s5 diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S index 356d5397b2a2..aa3d22967eef 100644 --- a/arch/riscv/kernel/head.S +++ b/arch/riscv/kernel/head.S @@ -110,9 +110,6 @@ relocate_enable_mmu: la a0, .Lsecondary_park csrw CSR_TVEC, a0 - /* Reload the global pointer */ - load_global_pointer - /* * Switch to kernel page tables. A full fence is necessary in order to * avoid using the trampoline translations, which are only correct for @@ -131,9 +128,6 @@ secondary_start_sbi: csrw CSR_IE, zero csrw CSR_IP, zero - /* Load the global pointer */ - load_global_pointer - /* * Disable FPU & VECTOR to detect illegal usage of * floating point or vector in kernel space @@ -228,9 +222,6 @@ SYM_CODE_START(_start_kernel) csrr a0, CSR_MHARTID #endif /* CONFIG_RISCV_M_MODE */ - /* Load the global pointer */ - load_global_pointer - /* * Disable FPU & VECTOR to detect illegal usage of * floating point or vector in kernel space diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c index 0f8f1c95ac38..844aede75662 100644 --- a/arch/riscv/kernel/smpboot.c +++ b/arch/riscv/kernel/smpboot.c @@ -41,6 +41,11 @@ static DECLARE_COMPLETION(cpu_running); +void __init smp_prepare_boot_cpu(void) +{ + set_my_cpu_offset(per_cpu_offset(smp_processor_id())); +} + void __init smp_prepare_cpus(unsigned int max_cpus) { int cpuid; @@ -212,6 +217,8 @@ asmlinkage __visible void smp_callin(void) struct mm_struct *mm = &init_mm; unsigned int curr_cpuid = smp_processor_id(); + set_my_cpu_offset(per_cpu_offset(curr_cpuid)); + if (has_vector()) { /* * Return as early as possible so the hart with a mismatching diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S index 2d54f309c140..0ec850489e0c 100644 --- a/arch/riscv/kernel/suspend_entry.S +++ b/arch/riscv/kernel/suspend_entry.S @@ -60,8 +60,6 @@ SYM_FUNC_START(__cpu_suspend_enter) SYM_FUNC_END(__cpu_suspend_enter) SYM_TYPED_FUNC_START(__cpu_resume_enter) - /* Load the global pointer */ - load_global_pointer #ifdef CONFIG_MMU /* Save A0 and A1 */