From patchwork Tue Jun 25 14:57:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Brown X-Patchwork-Id: 13711448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B5A8C2BBCA for ; Tue, 25 Jun 2024 15:03:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B518F6B00B7; Tue, 25 Jun 2024 11:03:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFDBB6B00B9; Tue, 25 Jun 2024 11:03:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97A246B00BA; Tue, 25 Jun 2024 11:03:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7532C6B00B7 for ; Tue, 25 Jun 2024 11:03:39 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9B20D140187 for ; Tue, 25 Jun 2024 15:03:38 +0000 (UTC) X-FDA: 82269730116.17.3E63EA7 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id E95B71800C8 for ; Tue, 25 Jun 2024 15:03:02 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cGxifYni; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of broonie@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=broonie@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719327767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fzDWkItQKyEKryr3je5IgsbHORJ8Tv6kOLGAgtoCGxo=; b=k8fTyCzV/FmzOFT2tA/j4Kyl2WmZwk+4+uXQqxAxsNh/i8nEPTyi/Rq5Qxo57VChT0FaKV JZrGY32DQVRmv3JoEJIaS/2mDN4MTVcy1OakWboMdmG8LpFxorbf0v4Ao3dRFpVnc5oAYm p5fdereCKi8N75ygxKm0LC/RzyCcDVk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719327767; a=rsa-sha256; cv=none; b=iLWJgzFTAgPZgb8ghH4GCZTqekLrM5T7SZ4+6Th+qQpfnqJ/g5rXztMAc9c4dyBt03hVit qeiHVbgIptPYMrXYCAgwSmuh3NoAn83cVG+SHRlzX5Bhd1af05tmCyegu1h/jrcqdbRE46 RCWtl82qyL+aL24sKL03tzchXB/mNHU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cGxifYni; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf24.hostedemail.com: domain of broonie@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=broonie@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B81C160BAF; Tue, 25 Jun 2024 15:03:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF857C4AF0B; Tue, 25 Jun 2024 15:02:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1719327781; bh=3MuDT0lq8KZmV1XvQ4eEMxmePoPU0e2iLvefjXGcLsE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=cGxifYnizydDYbAYvP+wDKkG6jlDC0JHeyn+sBV9HwjYjBjuPNIwVSCFTTtNSh8cE 8G3c/p+gwZhEcN3KfNLa6lnJsZBN5gtJbWav02ElEDbXl82phXCXq6kvKM//cpiR23 MihlsaHx3LzM/fp7zLy5Vgv2v3s6a14vKiOYgXZeb093IVhd1wP088G8YnbtqHjq+w Uco4e0oCGwFWiHqPqffp9/McCr/HnZg+5y9efALmGK+J69fkH+8xDmcB0Mc7BPTV7y D3kVduHxgqv99IaWX9z7M3JBW+2AeBzx5lUto2ZaMZK4ZSp3hplisJS9aA1P2M3x2Q mD+pGvYLqpG2w== From: Mark Brown Date: Tue, 25 Jun 2024 15:57:48 +0100 Subject: [PATCH v9 20/39] arm64/gcs: Ensure that new threads have a GCS MIME-Version: 1.0 Message-Id: <20240625-arm64-gcs-v9-20-0f634469b8f0@kernel.org> References: <20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org> In-Reply-To: <20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org> To: Catalin Marinas , Will Deacon , Jonathan Corbet , Andrew Morton , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Arnd Bergmann , Oleg Nesterov , Eric Biederman , Shuah Khan , "Rick P. Edgecombe" , Deepak Gupta , Ard Biesheuvel , Szabolcs Nagy , Kees Cook Cc: "H.J. Lu" , Paul Walmsley , Palmer Dabbelt , Albert Ou , Florian Weimer , Christian Brauner , Thiago Jung Bauermann , Ross Burton , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, kvmarm@lists.linux.dev, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Mark Brown X-Mailer: b4 0.14-dev-d4707 X-Developer-Signature: v=1; a=openpgp-sha256; l=8610; i=broonie@kernel.org; h=from:subject:message-id; bh=3MuDT0lq8KZmV1XvQ4eEMxmePoPU0e2iLvefjXGcLsE=; b=owEBbQGS/pANAwAKASTWi3JdVIfQAcsmYgBmetuCyD3AT1jyLg/JaFc7dWWdFw5qsb2djX5Sg4ao a6pI64OJATMEAAEKAB0WIQSt5miqZ1cYtZ/in+ok1otyXVSH0AUCZnrbggAKCRAk1otyXVSH0POFB/ sH2aiItPsj0Q9I2eKEI6VayCf5mkGZsy48E1nyLONDzZa/ATNrSnwvHtuTHH4y+r4rcCh0pkrmwAkj P1ToSmfjffCuSkCsICmwjtYiUpTSO9kYhn1GWUkCYLx83UBY4tQ9gyVxLbkwt/J0wcQHU8opIPOgHH eUA2lYuMoFk5UaaKN1dEI6Rjmf3xzVr0Adk6pQKEs6tPdJaNgib1hDAlq0pPGJjWFxyi4uD35lsv7J 1No1DBddD275ztvUNT5JPIGRgZxanHh+or1SH0JjxSl5xbMQ5u6Ia+HrOYmqf6ze8X4l1F9kdQ778g 4l+a0dz0OHEabQP4iwNP5uo+J+K5jG X-Developer-Key: i=broonie@kernel.org; a=openpgp; fpr=3F2568AAC26998F9E813A1C5C3F436CA30F5D8EB X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E95B71800C8 X-Stat-Signature: o3mrw6joc6rcefysirjhqrkacbexypcs X-Rspam-User: X-HE-Tag: 1719327782-898930 X-HE-Meta: U2FsdGVkX1/LOgwiDA0GFWYqDZhLKon0czRuB0EwZYitvj9LSRNtZv8SJOoJaDODQnmoeI8k20V4PeXW9MKlXDhys5AAFEgsaZ7+b1WxHot51ypjtXhvVlz1WXNhNQSbZh9s/fXuFeT9uuxGeIm3Lo6+CnMjNtAXFwtKCV3cUWE4yv8qLIXuoZxmuGTCyS2Uaj8ULjOxL00ihJ3P7QgakE/OgxVZIBr6TYXfqPH10kXRcPzs3S53UQemMmpl2KVA8rOFsz7/AqyVwiEztayVbqJdCJxspvNvtsRnuUfXR0KEnyb2epbIrNSII5v3Un7wwpPA1DcSbAMq1ABDrkxs+d/hE4iJppLhyHivnSwWBwwR2jPMFbgw1GEea6dWHPya4oIbplAZlGPvtVDdDXtXkAZjuAJ24JNVKmJWA8aM/Ugn9vhFqo/sozXaheFo1okLhCl/+9BMwj9Bx9KIdOIxhrfflVelwKdi9sycL0djKrnrpUew2JIMjtsOWpOAIM9mGlz8CfCF0G2q8Pll1FaysVIwvLFGw6okPSPwgFouWph4/nT3UcHkUVW0xNZLVnvFfmr+1abdKwMjZNHCs089/QwlZGG7F3QbQcXrqASOX+SZo1jRmuAOGNh99mtKca8lhWzf2MEROPO+EaItKVUKKVKNvmEKqBBbDILp06Xjg8GCh5j3qACDLHH6UG1b0FZT0LxhhqCqKaMODj+xEtvoaHvQ1YkHxvRIXLx+lx7i8FS6FTbAhySyevFGTopDeYfevRlMAh3cQ0984x5bQ+8kb4bSgwvdyyOoUPFtwVa6lHT3qFG1Al/pxf9cW2EjGKhbvXph+gE++4oxoUQZHGCW/GJztbY3gB+PufNyPeno07mV/cTwQuODg6WXJsfz7zBSjbpjqiLIXaFXfjvDaBlbozBwPWzLF0btGhK01vhX6vuav77jsTeDlYt5OkGRtlBV2rWBaLlMrvAgDa25Xm4 58rjwc6S PobbG1bUhEEFUTFQ8UhUMHoxBR482RR7LYAurJzIwj830pAkdetu8UDkD9Aaahf7ijLt72khio6ZYwzChDj3kpuBODnkrmK4lFdx+bSlEu8oVISed5NXhaGIUP8tkF/DIyoWvCxcnHSUmgUdl1LhXNxBheOCtcooXVMYv5ck0bs24eXYQrmV6sFOoXdOlR6nQ2qEZq4NVyMQIyBzj5jJ6yUfVzQDElXvOwGyXfPkAK/DPeb5+zbF0z/ibFCKx+2UgTQgg5apzpQTqVUQ+0r8k4drXDFKfRVpNTD/q0fyKSNwmZ6pgtIMfv2A3NwF6q10vsZRO3NpWYdvD0grRCGZ++cbhzFZc1qrQMbf8YHD/Og/wCXAUaRAvCwyNn+cUCMdL8UkZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When a new thread is created by a thread with GCS enabled the GCS needs to be specified along with the regular stack. clone3() has been extended to support this case, allowing userspace to explicitly specify the size and location of the GCS. The specified GCS must have a valid GCS token at the top of the stack, as though userspace were pivoting to the new GCS. This will be consumed on use. At present we do not atomically consume the token, this will be addressed in a future revision. Unfortunately plain clone() is not extensible and existing clone3() users will not specify a stack so all existing code would be broken if we mandated specifying the stack explicitly. For compatibility with these cases and also x86 (which did not initially implement clone3() support for shadow stacks) if no GCS is specified we will allocate one so when a thread is created which has GCS enabled allocate one for it. We follow the extensively discussed x86 implementation and allocate min(RLIMIT_STACK, 2G). Since the GCS only stores the call stack and not any variables this should be more than sufficient for most applications. GCSs allocated via this mechanism will be freed when the thread exits, those explicitly configured by the user will not. Reviewed-by: Thiago Jung Bauermann Signed-off-by: Mark Brown --- arch/arm64/include/asm/gcs.h | 9 +++ arch/arm64/kernel/process.c | 29 +++++++++ arch/arm64/mm/gcs.c | 143 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 181 insertions(+) diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h index 04594ef59dad..c1f274fdb9c0 100644 --- a/arch/arm64/include/asm/gcs.h +++ b/arch/arm64/include/asm/gcs.h @@ -8,6 +8,8 @@ #include #include +struct kernel_clone_args; + static inline void gcsb_dsync(void) { asm volatile(".inst 0xd503227f" : : : "memory"); @@ -58,6 +60,8 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task) void gcs_set_el0_mode(struct task_struct *task); void gcs_free(struct task_struct *task); void gcs_preserve_current_state(void); +unsigned long gcs_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args); #else @@ -69,6 +73,11 @@ static inline bool task_gcs_el0_enabled(struct task_struct *task) static inline void gcs_set_el0_mode(struct task_struct *task) { } static inline void gcs_free(struct task_struct *task) { } static inline void gcs_preserve_current_state(void) { } +static inline unsigned long gcs_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args) +{ + return -ENOTSUPP; +} #endif diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 5f00cb0da9c3..d6d3a96cf2e4 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -285,9 +285,32 @@ static void flush_gcs(void) write_sysreg_s(0, SYS_GCSPR_EL0); } +static int copy_thread_gcs(struct task_struct *p, + const struct kernel_clone_args *args) +{ + unsigned long gcs; + + gcs = gcs_alloc_thread_stack(p, args); + if (IS_ERR_VALUE(gcs)) + return PTR_ERR((void *)gcs); + + p->thread.gcs_el0_mode = current->thread.gcs_el0_mode; + p->thread.gcs_el0_locked = current->thread.gcs_el0_locked; + + /* Ensure the current state of the GCS is seen by CoW */ + gcsb_dsync(); + + return 0; +} + #else static void flush_gcs(void) { } +static int copy_thread_gcs(struct task_struct *p, + const struct kernel_clone_args *args) +{ + return 0; +} #endif @@ -303,6 +326,7 @@ void flush_thread(void) void arch_release_task_struct(struct task_struct *tsk) { fpsimd_release_task(tsk); + gcs_free(tsk); } int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) @@ -366,6 +390,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) unsigned long stack_start = args->stack; unsigned long tls = args->tls; struct pt_regs *childregs = task_pt_regs(p); + int ret; memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context)); @@ -407,6 +432,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) p->thread.uw.tp_value = tls; p->thread.tpidr2_el0 = 0; } + + ret = copy_thread_gcs(p, args); + if (ret != 0) + return ret; } else { /* * A kthread has no context to ERET to, so ensure any buggy diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c index b0a67efc522b..4a3ce8e3bdfb 100644 --- a/arch/arm64/mm/gcs.c +++ b/arch/arm64/mm/gcs.c @@ -8,6 +8,139 @@ #include #include +static unsigned long alloc_gcs(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long mapped_addr, unused; + + if (addr) + flags |= MAP_FIXED_NOREPLACE; + + mmap_write_lock(mm); + mapped_addr = do_mmap(NULL, addr, size, PROT_READ | PROT_WRITE, flags, + VM_SHADOW_STACK, 0, &unused, NULL); + mmap_write_unlock(mm); + + return mapped_addr; +} + +static unsigned long gcs_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + /* Allocate RLIMIT_STACK/2 with limits of PAGE_SIZE..2G */ + size = PAGE_ALIGN(min_t(unsigned long long, + rlimit(RLIMIT_STACK) / 2, SZ_2G)); + return max(PAGE_SIZE, size); +} + +static bool gcs_consume_token(struct mm_struct *mm, unsigned long user_addr) +{ + u64 expected = GCS_CAP(user_addr); + u64 val; + int ret; + + /* This should really be an atomic cpmxchg. It is not. */ + ret = access_remote_vm(mm, user_addr, &val, sizeof(val), + FOLL_FORCE); + if (ret != sizeof(val)) + return false; + + if (val != expected) + return false; + + val = 0; + ret = access_remote_vm(mm, user_addr, &val, sizeof(val), + FOLL_FORCE | FOLL_WRITE); + if (ret != sizeof(val)) + return false; + + return true; +} + +int arch_shstk_post_fork(struct task_struct *tsk, + struct kernel_clone_args *args) +{ + struct mm_struct *mm; + unsigned long addr, size, gcspr_el0; + int ret = 0; + + mm = get_task_mm(tsk); + if (!mm) + return -EFAULT; + + addr = args->shadow_stack; + size = args->shadow_stack_size; + + /* + * There should be a token, and there is likely to be an optional + * end of stack marker above it. + */ + gcspr_el0 = addr + size - (2 * sizeof(u64)); + if (!gcs_consume_token(mm, gcspr_el0)) { + gcspr_el0 += sizeof(u64); + if (!gcs_consume_token(mm, gcspr_el0)) { + ret = -EINVAL; + goto out; + } + } + + tsk->thread.gcspr_el0 = gcspr_el0 + sizeof(u64); + +out: + mmput(mm); + + return ret; +} + +unsigned long gcs_alloc_thread_stack(struct task_struct *tsk, + const struct kernel_clone_args *args) +{ + unsigned long addr, size; + + /* If the user specified a GCS use it. */ + if (args->shadow_stack_size) { + if (!system_supports_gcs()) + return (unsigned long)ERR_PTR(-EINVAL); + + /* GCSPR_EL0 will be set up when verifying token post fork */ + addr = args->shadow_stack; + } else { + + /* + * Otherwise fall back to legacy clone() support and + * implicitly allocate a GCS if we need a new one. + */ + + if (!system_supports_gcs()) + return 0; + + if (!task_gcs_el0_enabled(tsk)) + return 0; + + if ((args->flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) { + tsk->thread.gcspr_el0 = read_sysreg_s(SYS_GCSPR_EL0); + return 0; + } + + size = args->stack_size; + + size = gcs_size(size); + addr = alloc_gcs(0, size, 0, 0); + if (IS_ERR_VALUE(addr)) + return addr; + + tsk->thread.gcs_base = addr; + tsk->thread.gcs_size = size; + tsk->thread.gcspr_el0 = addr + size - sizeof(u64); + } + + return addr; +} + /* * Apply the GCS mode configured for the specified task to the * hardware. @@ -30,6 +163,16 @@ void gcs_set_el0_mode(struct task_struct *task) void gcs_free(struct task_struct *task) { + + /* + * When fork() with CLONE_VM fails, the child (tsk) already + * has a GCS allocated, and exit_thread() calls this function + * to free it. In this case the parent (current) and the + * child share the same mm struct. + */ + if (!task->mm || task->mm != current->mm) + return; + if (task->thread.gcs_base) vm_munmap(task->thread.gcs_base, task->thread.gcs_size);