From patchwork Wed Jun 16 03:21:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96CC5C48BE8 for ; Wed, 16 Jun 2021 03:21:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 265AE613E9 for ; Wed, 16 Jun 2021 03:21:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 265AE613E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32C406B006E; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 301FF6B0070; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A1FE6B0071; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id DC7FF6B006E for ; Tue, 15 Jun 2021 23:21:17 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8217E181AC9B6 for ; Wed, 16 Jun 2021 03:21:17 +0000 (UTC) X-FDA: 78258136194.33.ED203BD Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf08.hostedemail.com (Postfix) with ESMTP id 69B40801912B for ; Wed, 16 Jun 2021 03:21:07 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 299A6613C2; Wed, 16 Jun 2021 03:21:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813676; bh=H/1rsxHk8ldroOc2+bi7w574v2BCSyir0ZtMa99Tvns=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ak4D+cP32X7Yl7slNXj0+783UpJ9FByLmWHAKV/geREVCHo/6YJba1kXFom/wdPmC DbRSB/5eUwVgk5DSI9wOAdEn/iPagIJbKcwAeJPMsgt6DzFRN8NvCSyOK0kGqZGO7C 3hF66t74y2hPfm29zLySP0LlpB4yhUc6JBd/+ML4aK/5GMvclAIShjiQrwOds4c9Vn 0a4onM2CsctRRCWcKYItD5BKM2WqZmgvmNUzs+62JyUTz3vQOF6HdYdAXFRpw/VM+c soZLf4m/ToH+miHTFXdcr0n22CGthb/ip7ngvOKBLpQi0OO+6dK6+mNk491T1uCQm6 hQ6ngyb4wJBTA== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra Subject: [PATCH 1/8] membarrier: Document why membarrier() works Date: Tue, 15 Jun 2021 20:21:06 -0700 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Ak4D+cP3; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org X-Stat-Signature: 6xgawzx6swfyxi174zreptnwb6tifazg X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 69B40801912B X-HE-Tag: 1623813667-630435 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We had a nice comment at the top of membarrier.c explaining why membarrier worked in a handful of scenarios, but that consisted more of a list of things not to forget than an actual description of the algorithm and why it should be expected to work. Add a comment explaining my understanding of the algorithm. This exposes a couple of implementation issues that I will hopefully fix up in subsequent patches. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski --- kernel/sched/membarrier.c | 55 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index b5add64d9698..3173b063d358 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -7,6 +7,61 @@ #include "sched.h" /* + * The basic principle behind the regular memory barrier mode of membarrier() + * is as follows. For each CPU, membarrier() operates in one of two + * modes. Either it sends an IPI or it does not. If membarrier() sends an + * IPI, then we have the following sequence of events: + * + * 1. membarrier() does smp_mb(). + * 2. membarrier() does a store (the IPI request payload) that is observed by + * the target CPU. + * 3. The target CPU does smp_mb(). + * 4. The target CPU does a store (the completion indication) that is observed + * by membarrier()'s wait-for-IPIs-to-finish request. + * 5. membarrier() does smp_mb(). + * + * So all pre-membarrier() local accesses are visible after the IPI on the + * target CPU and all pre-IPI remote accesses are visible after + * membarrier(). IOW membarrier() has synchronized both ways with the target + * CPU. + * + * (This has the caveat that membarrier() does not interrupt the CPU that it's + * running on at the time it sends the IPIs. However, if that is the CPU on + * which membarrier() starts and/or finishes, membarrier() does smp_mb() and, + * if not, then membarrier() scheduled, and scheduling had better include a + * full barrier somewhere for basic correctness regardless of membarrier.) + * + * If membarrier() does not send an IPI, this means that membarrier() reads + * cpu_rq(cpu)->curr->mm and that the result is not equal to the target + * mm. Let's assume for now that tasks never change their mm field. The + * sequence of events is: + * + * 1. Target CPU switches away from the target mm (or goes lazy or has never + * run the target mm in the first place). This involves smp_mb() followed + * by a write to cpu_rq(cpu)->curr. + * 2. membarrier() does smp_mb(). (This is NOT synchronized with any action + * done by the target.) + * 3. membarrier() observes the value written in step 1 and does *not* observe + * the value written in step 5. + * 4. membarrier() does smp_mb(). + * 5. Target CPU switches back to the target mm and writes to + * cpu_rq(cpu)->curr. (This is NOT synchronized with any action on + * membarrier()'s part.) + * 6. Target CPU executes smp_mb() + * + * All pre-schedule accesses on the remote CPU are visible after membarrier() + * because they all precede the target's write in step 1 and are synchronized + * to the local CPU by steps 3 and 4. All pre-membarrier() accesses on the + * local CPU are visible on the remote CPU after scheduling because they + * happen before the smp_mb(); read in steps 2 and 3 and that read preceeds + * the target's smp_mb() in step 6. + * + * However, tasks can change their ->mm, e.g., via kthread_use_mm(). So + * tasks that switch their ->mm must follow the same rules as the scheduler + * changing rq->curr, and the membarrier() code needs to do both dereferences + * carefully. + * + * * For documentation purposes, here are some membarrier ordering * scenarios to keep in mind: * From patchwork Wed Jun 16 03:21:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC2FC48BE5 for ; Wed, 16 Jun 2021 03:21:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E5E5A613D0 for ; Wed, 16 Jun 2021 03:21:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E5E5A613D0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D3F0D6B0070; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D22C86B0071; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B228A6B0072; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0219.hostedemail.com [216.40.44.219]) by kanga.kvack.org (Postfix) with ESMTP id 809566B0070 for ; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 168BB180AD806 for ; Wed, 16 Jun 2021 03:21:18 +0000 (UTC) X-FDA: 78258136236.17.6B06E7F Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP id B2ABF9001E40 for ; Wed, 16 Jun 2021 03:21:05 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id CF0C461246; Wed, 16 Jun 2021 03:21:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813676; bh=8xTEY6258VphZmOp7/78ibSDKTiVzWwNTg/WzyrBRA0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UnYUJYlmRtQJWbzC8b+FXG8rHtN8FVajY3diwKCSB/YYyyhBfVHO53x2a4y/YY9lb 1D/V4u/xLroD7HhpYI9H/xYx4l5OZTdKBpkHUBP8ODiUyfV41ZLMQBJOThFMzKywcL Jvd3Wgvj/wWNKYs/fmCsljtmBD7UxFTyPoGKmwWZQvmHBXKjDBB2cg3pEthMINeBgD 491sTCxKS4ZKLJ4O5NR92A9rDTOz420ArdXTzHsPCf5dKtyq3+NrEPDrrCLdwjltlz M4D9EqrdrGjDUL3O+63rZXQsB0LZOSyVZruC4LRJU+fLSUE0uLiSSbpM2qmPfDhJtE saRZgdHCT9AuQ== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra Subject: [PATCH 2/8] x86/mm: Handle unlazying membarrier core sync in the arch code Date: Tue, 15 Jun 2021 20:21:07 -0700 Message-Id: <571b7e6b6a907e8a1ffc541c3f0005d347406fd0.1623813516.git.luto@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B2ABF9001E40 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=UnYUJYlm; spf=pass (imf19.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: 66r1m7esssojjfczhez8aj87n38gx7m7 X-HE-Tag: 1623813665-441235 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The core scheduler isn't a great place for membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't actually know whether we are lazy. With the old code, if a CPU is running a membarrier-registered task, goes idle, gets unlazied via a TLB shootdown IPI, and switches back to the membarrier-registered task, it will do an unnecessary core sync. Conveniently, x86 is the only architecture that does anything in this sync_core_before_usermode(), so membarrier_mm_sync_core_before_usermode() is a no-op on all other architectures and we can just move the code. (I am not claiming that the SYNC_CORE code was correct before or after this change on any non-x86 architecture. I merely claim that this change improves readability, is correct on x86, and makes no change on any other architecture.) Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski --- arch/x86/mm/tlb.c | 53 +++++++++++++++++++++++++++++++--------- include/linux/sched/mm.h | 13 ---------- kernel/sched/core.c | 13 ++++------ 3 files changed, 46 insertions(+), 33 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 78804680e923..59488d663e68 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -473,16 +474,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, this_cpu_write(cpu_tlbstate_shared.is_lazy, false); /* - * The membarrier system call requires a full memory barrier and - * core serialization before returning to user-space, after - * storing to rq->curr, when changing mm. This is because - * membarrier() sends IPIs to all CPUs that are in the target mm - * to make them issue memory barriers. However, if another CPU - * switches to/from the target mm concurrently with - * membarrier(), it can cause that CPU not to receive an IPI - * when it really should issue a memory barrier. Writing to CR3 - * provides that full memory barrier and core serializing - * instruction. + * membarrier() support requires that, when we change rq->curr->mm: + * + * - If next->mm has membarrier registered, a full memory barrier + * after writing rq->curr (or rq->curr->mm if we switched the mm + * without switching tasks) and before returning to user mode. + * + * - If next->mm has SYNC_CORE registered, then we sync core before + * returning to user mode. + * + * In the case where prev->mm == next->mm, membarrier() uses an IPI + * instead, and no particular barriers are needed while context + * switching. + * + * x86 gets all of this as a side-effect of writing to CR3 except + * in the case where we unlazy without flushing. + * + * All other architectures are civilized and do all of this implicitly + * when transitioning from kernel to user mode. */ if (real_prev == next) { VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) != @@ -500,7 +509,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* * If the CPU is not in lazy TLB mode, we are just switching * from one thread in a process to another thread in the same - * process. No TLB flush required. + * process. No TLB flush or membarrier() synchronization + * is required. */ if (!was_lazy) return; @@ -510,16 +520,35 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * If the TLB is up to date, just use it. * The barrier synchronizes with the tlb_gen increment in * the TLB shootdown code. + * + * As a future optimization opportunity, it's plausible + * that the x86 memory model is strong enough that this + * smp_mb() isn't needed. */ smp_mb(); next_tlb_gen = atomic64_read(&next->context.tlb_gen); if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) == - next_tlb_gen) + next_tlb_gen) { +#ifdef CONFIG_MEMBARRIER + /* + * We switched logical mm but we're not going to + * write to CR3. We already did smp_mb() above, + * but membarrier() might require a sync_core() + * as well. + */ + if (unlikely(atomic_read(&next->membarrier_state) & + MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)) + sync_core_before_usermode(); +#endif + return; + } /* * TLB contents went out of date while we were in lazy * mode. Fall through to the TLB switching code below. + * No need for an explicit membarrier invocation -- the CR3 + * write will serialize. */ new_asid = prev_asid; need_flush = true; diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index e24b1fe348e3..24d97d1b6252 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -345,16 +345,6 @@ enum { #include #endif -static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) -{ - if (current->mm != mm) - return; - if (likely(!(atomic_read(&mm->membarrier_state) & - MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) - return; - sync_core_before_usermode(); -} - extern void membarrier_exec_mmap(struct mm_struct *mm); extern void membarrier_update_current_mm(struct mm_struct *next_mm); @@ -370,9 +360,6 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev, static inline void membarrier_exec_mmap(struct mm_struct *mm) { } -static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) -{ -} static inline void membarrier_update_current_mm(struct mm_struct *next_mm) { } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 5226cc26a095..e4c122f8bf21 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4220,22 +4220,19 @@ static struct rq *finish_task_switch(struct task_struct *prev) kmap_local_sched_in(); fire_sched_in_preempt_notifiers(current); + /* * When switching through a kernel thread, the loop in * membarrier_{private,global}_expedited() may have observed that * kernel thread and not issued an IPI. It is therefore possible to * schedule between user->kernel->user threads without passing though * switch_mm(). Membarrier requires a barrier after storing to - * rq->curr, before returning to userspace, so provide them here: - * - * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly - * provided by mmdrop(), - * - a sync_core for SYNC_CORE. + * rq->curr, before returning to userspace, and mmdrop() provides + * this barrier. */ - if (mm) { - membarrier_mm_sync_core_before_usermode(mm); + if (mm) mmdrop(mm); - } + if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) prev->sched_class->task_dead(prev); From patchwork Wed Jun 16 03:21:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22C95C48BDF for ; Wed, 16 Jun 2021 03:21:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A11C6613D0 for ; Wed, 16 Jun 2021 03:21:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A11C6613D0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 501696B0071; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B0266B0072; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 290476B0073; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id ECF146B0072 for ; Tue, 15 Jun 2021 23:21:18 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 8A2D8181AC9B6 for ; Wed, 16 Jun 2021 03:21:18 +0000 (UTC) X-FDA: 78258136236.35.5132A4B Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf29.hostedemail.com (Postfix) with ESMTP id 093D5371 for ; Wed, 16 Jun 2021 03:21:04 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 7C57D613D3; Wed, 16 Jun 2021 03:21:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813677; bh=KDnBZ2GHbeI0IjN+6nTEJv7f76MsISUAkc+t+tLuGYc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PzjhJGz+SpVdoASezCgMOqvFDc6hpGx6dhaMfNT9d5C3EW5ZginQ8GoKPB++A3vcf mCA1iyMzMRD4kGdYNFop8UHd+LylSRGCEqS95/ANcsATNeBwu4cI8LBeryq3thgosO yQi1F9CVZYfIos9SGUQA4pGMvh1tSEYrWsTzooKVBHRnkbzwtgNVsOTfCgl4k/Opeo k9nMFSDm+g02ufFTu3JBOlCV0vuWSCC21abQU/0X6NWwkRU3SQNfncb11RtDGfxupu OdwaU/nM5Ufaa3K+4Rkyg59DyhQwst1aPszUA14M5pVv1r/xV38Hw5z+dLarYjPmW4 DJrD1z+n9Btiw== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra Subject: [PATCH 3/8] membarrier: Remove membarrier_arch_switch_mm() prototype in core code Date: Tue, 15 Jun 2021 20:21:08 -0700 Message-Id: <2d45c55c4fbbe38317ff625e2a2158b6fbe0dc2d.1623813516.git.luto@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PzjhJGz+; spf=pass (imf29.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 093D5371 X-Stat-Signature: rw3cby5me43tjmke4oy74wmwyo7y791m X-HE-Tag: 1623813664-204936 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: membarrier_arch_switch_mm()'s sole implementation and caller are in arch/powerpc. Having a fallback implementation in include/linux is confusing -- remove it. It's still mentioned in a comment, but a subsequent patch will remove it. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski Acked-by: Nicholas Piggin Acked-by: Mathieu Desnoyers --- include/linux/sched/mm.h | 7 ------- 1 file changed, 7 deletions(-) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 24d97d1b6252..10aace21d25e 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -350,13 +350,6 @@ extern void membarrier_exec_mmap(struct mm_struct *mm); extern void membarrier_update_current_mm(struct mm_struct *next_mm); #else -#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS -static inline void membarrier_arch_switch_mm(struct mm_struct *prev, - struct mm_struct *next, - struct task_struct *tsk) -{ -} -#endif static inline void membarrier_exec_mmap(struct mm_struct *mm) { } From patchwork Wed Jun 16 03:21:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323811 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C712C48BE8 for ; Wed, 16 Jun 2021 03:21:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 99C0861246 for ; Wed, 16 Jun 2021 03:21:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99C0861246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CCA3E6B0072; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7D096B0073; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACC3B6B0074; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id 5FEE96B0073 for ; Tue, 15 Jun 2021 23:21:19 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E5767824999B for ; Wed, 16 Jun 2021 03:21:18 +0000 (UTC) X-FDA: 78258136236.03.13A73E4 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP id 920D09001E40 for ; Wed, 16 Jun 2021 03:21:06 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 2BE8C613DB; Wed, 16 Jun 2021 03:21:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813678; bh=Oj1vCsN2BPGc3P+iT1tIZ7CyX6hi/S+z5frsn89G8kU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l+grdF5BoRBtmAGqZOzBiIZOdQTNRAwWMfymPVh6qQaUACHWRoqCOU9xuf/XoD8HL k1hynejm8pVU7rcCf6coWPb+VDmuPAFALr4scJTmp2jEZsIz5xXhjp7Lw3NpOyOzEn POhrzEdWS1PAuP91maVCYV2PKfOD3xidT+nPz9WEBSfOYUpKSH/4JXaegRFdBhkZZA r+tlWEHo8wjhNFLf9CISJfxBvw1LqJYSQPYfmCfT9nuFB/suk1ItIww8Ey+Qp2NFTQ BGtV2zEZ7mWYUB4eFnTu+b5ipmnzw2lDjdwFwTe3AlFW3KCooucfP6Z0lxFsaNkARq bCR4ItuLs3tqQ== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra Subject: [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Date: Tue, 15 Jun 2021 20:21:09 -0700 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 920D09001E40 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=l+grdF5B; spf=pass (imf19.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: w4emxomp3sz3sds7pco5i18hyr5zczxs X-HE-Tag: 1623813666-297162 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: membarrier() needs a barrier after any CPU changes mm. There is currently a comment explaining why this barrier probably exists in all cases. This is very fragile -- any change to the relevant parts of the scheduler might get rid of these barriers, and it's not really clear to me that the barrier actually exists in all necessary cases. Simplify the logic by adding an explicit barrier, and allow architectures to override it as an optimization if they want to. One of the deleted comments in this patch said "It is therefore possible to schedule between user->kernel->user threads without passing through switch_mm()". It is possible to do this without, say, writing to CR3 on x86, but the core scheduler indeed calls switch_mm_irqs_off() to tell the arch code to go back from lazy mode to no-lazy mode. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski --- include/linux/sched/mm.h | 21 +++++++++++++++++++++ kernel/kthread.c | 12 +----------- kernel/sched/core.c | 35 +++++++++-------------------------- 3 files changed, 31 insertions(+), 37 deletions(-) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 10aace21d25e..c6eebbafadb0 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -341,6 +341,27 @@ enum { MEMBARRIER_FLAG_RSEQ = (1U << 1), }; +#ifdef CONFIG_MEMBARRIER + +/* + * Called by the core scheduler after calling switch_mm_irqs_off(). + * Architectures that have implicit barriers when switching mms can + * override this as an optimization. + */ +#ifndef membarrier_finish_switch_mm +static inline void membarrier_finish_switch_mm(int membarrier_state) +{ + if (membarrier_state & (MEMBARRIER_STATE_GLOBAL_EXPEDITED | MEMBARRIER_STATE_PRIVATE_EXPEDITED)) + smp_mb(); +} +#endif + +#else + +static inline void membarrier_finish_switch_mm(int membarrier_state) {} + +#endif + #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS #include #endif diff --git a/kernel/kthread.c b/kernel/kthread.c index fe3f2a40d61e..8275b415acec 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1325,25 +1325,15 @@ void kthread_use_mm(struct mm_struct *mm) tsk->mm = mm; membarrier_update_current_mm(mm); switch_mm_irqs_off(active_mm, mm, tsk); + membarrier_finish_switch_mm(atomic_read(&mm->membarrier_state)); local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); #endif - /* - * When a kthread starts operating on an address space, the loop - * in membarrier_{private,global}_expedited() may not observe - * that tsk->mm, and not issue an IPI. Membarrier requires a - * memory barrier after storing to tsk->mm, before accessing - * user-space memory. A full memory barrier for membarrier - * {PRIVATE,GLOBAL}_EXPEDITED is implicitly provided by - * mmdrop(), or explicitly with smp_mb(). - */ if (active_mm != mm) mmdrop(active_mm); - else - smp_mb(); to_kthread(tsk)->oldfs = force_uaccess_begin(); } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e4c122f8bf21..329a6d2a4e13 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4221,15 +4221,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) fire_sched_in_preempt_notifiers(current); - /* - * When switching through a kernel thread, the loop in - * membarrier_{private,global}_expedited() may have observed that - * kernel thread and not issued an IPI. It is therefore possible to - * schedule between user->kernel->user threads without passing though - * switch_mm(). Membarrier requires a barrier after storing to - * rq->curr, before returning to userspace, and mmdrop() provides - * this barrier. - */ if (mm) mmdrop(mm); @@ -4311,15 +4302,14 @@ context_switch(struct rq *rq, struct task_struct *prev, prev->active_mm = NULL; } else { // to user membarrier_switch_mm(rq, prev->active_mm, next->mm); + switch_mm_irqs_off(prev->active_mm, next->mm, next); + /* * sys_membarrier() requires an smp_mb() between setting - * rq->curr / membarrier_switch_mm() and returning to userspace. - * - * The below provides this either through switch_mm(), or in - * case 'prev->active_mm == next->mm' through - * finish_task_switch()'s mmdrop(). + * rq->curr->mm to a membarrier-enabled mm and returning + * to userspace. */ - switch_mm_irqs_off(prev->active_mm, next->mm, next); + membarrier_finish_switch_mm(rq->membarrier_state); if (!prev->mm) { // from kernel /* will mmdrop() in finish_task_switch(). */ @@ -5121,17 +5111,10 @@ static void __sched notrace __schedule(bool preempt) RCU_INIT_POINTER(rq->curr, next); /* * The membarrier system call requires each architecture - * to have a full memory barrier after updating - * rq->curr, before returning to user-space. - * - * Here are the schemes providing that barrier on the - * various architectures: - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. - * - finish_lock_switch() for weakly-ordered - * architectures where spin_unlock is a full barrier, - * - switch_to() for arm64 (weakly-ordered, spin_unlock - * is a RELEASE barrier), + * to have a full memory barrier before and after updating + * rq->curr->mm, before returning to userspace. This + * is provided by membarrier_finish_switch_mm(). Architectures + * that want to optimize this can override that function. */ ++*switch_count; From patchwork Wed Jun 16 03:21:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2281C48BE5 for ; Wed, 16 Jun 2021 03:21:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8C2EF61246 for ; Wed, 16 Jun 2021 03:21:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C2EF61246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B8EE36B0073; Tue, 15 Jun 2021 23:21:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B44AF6B0074; Tue, 15 Jun 2021 23:21:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A094E6B0075; Tue, 15 Jun 2021 23:21:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 5EE656B0073 for ; Tue, 15 Jun 2021 23:21:20 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0CBC7A8F4 for ; Wed, 16 Jun 2021 03:21:20 +0000 (UTC) X-FDA: 78258136320.09.3CAB023 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf24.hostedemail.com (Postfix) with ESMTP id C0E82A0001A9 for ; Wed, 16 Jun 2021 03:21:09 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id CBB8B613B9; Wed, 16 Jun 2021 03:21:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813678; bh=q8HZKPxpOM/5sR9eo6stVaodXdedvsUQjQBqGomYFTM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PVgslUPvnm4dag5figXFo7ByhJdeLViFstuVBA+2bOh0q+jJHFNjAdQ7OpvTwfLO/ e3N/qUhm7guRAWmGpjwLX9hfG3lnJpHOriP8yMB7h7o+8XSPrdeWGBjmnx4GETkErX jGk2ZjznvfntfeJKtdGpxUnGlbz+4wNCAmbFxySDGlhuJPuIteVF1+Nkxr1WP6Wjwv wVxUjteZpKgHgRynJAVOte0dGblqJ8Ae3qokLj6BiRa9XkfLj2r5FtpB74jgw8ZY5+ ZVrFcz4GKe0g0AhBux8EONMOlllRPzI1zDXWURMqMn27BBLuiAM7SOqCIN1oYlj2LX vUkKcug1HeGlw== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra Subject: [PATCH 5/8] membarrier, kthread: Use _ONCE accessors for task->mm Date: Tue, 15 Jun 2021 20:21:10 -0700 Message-Id: <74ace142f48db7d0e71b05b5ace72bfe8e0a2652.1623813516.git.luto@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PVgslUPv; spf=pass (imf24.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: a8cp5wx9zot8gsx36j8mkdu43n4ct6gm X-Rspamd-Queue-Id: C0E82A0001A9 X-Rspamd-Server: rspam06 X-HE-Tag: 1623813669-198599 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: membarrier reads cpu_rq(remote cpu)->curr->mm without locking. Use READ_ONCE() and WRITE_ONCE() to remove the data races. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski Acked-by: Nicholas Piggin --- fs/exec.c | 2 +- kernel/kthread.c | 4 ++-- kernel/sched/membarrier.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 18594f11c31f..2e63dea83411 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1007,7 +1007,7 @@ static int exec_mmap(struct mm_struct *mm) local_irq_disable(); active_mm = tsk->active_mm; tsk->active_mm = mm; - tsk->mm = mm; + WRITE_ONCE(tsk->mm, mm); /* membarrier reads this without locks */ /* * This prevents preemption while active_mm is being loaded and * it and mm are being updated, which could cause problems for diff --git a/kernel/kthread.c b/kernel/kthread.c index 8275b415acec..4962794e02d5 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1322,7 +1322,7 @@ void kthread_use_mm(struct mm_struct *mm) mmgrab(mm); tsk->active_mm = mm; } - tsk->mm = mm; + WRITE_ONCE(tsk->mm, mm); /* membarrier reads this without locks */ membarrier_update_current_mm(mm); switch_mm_irqs_off(active_mm, mm, tsk); membarrier_finish_switch_mm(atomic_read(&mm->membarrier_state)); @@ -1363,7 +1363,7 @@ void kthread_unuse_mm(struct mm_struct *mm) smp_mb__after_spinlock(); sync_mm_rss(mm); local_irq_disable(); - tsk->mm = NULL; + WRITE_ONCE(tsk->mm, NULL); /* membarrier reads this without locks */ membarrier_update_current_mm(NULL); /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 3173b063d358..c32c32a2441e 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -410,7 +410,7 @@ static int membarrier_private_expedited(int flags, int cpu_id) goto out; rcu_read_lock(); p = rcu_dereference(cpu_rq(cpu_id)->curr); - if (!p || p->mm != mm) { + if (!p || READ_ONCE(p->mm) != mm) { rcu_read_unlock(); goto out; } @@ -423,7 +423,7 @@ static int membarrier_private_expedited(int flags, int cpu_id) struct task_struct *p; p = rcu_dereference(cpu_rq(cpu)->curr); - if (p && p->mm == mm) + if (p && READ_ONCE(p->mm) == mm) __cpumask_set_cpu(cpu, tmpmask); } rcu_read_unlock(); @@ -521,7 +521,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm) struct task_struct *p; p = rcu_dereference(rq->curr); - if (p && p->mm == mm) + if (p && READ_ONCE(p->mm) == mm) __cpumask_set_cpu(cpu, tmpmask); } rcu_read_unlock(); From patchwork Wed Jun 16 03:21:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323817 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9415C48BDF for ; Wed, 16 Jun 2021 03:21:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5E02561246 for ; Wed, 16 Jun 2021 03:21:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E02561246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5C4346B0074; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54E136B0075; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 418836B0078; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id 04CB66B0074 for ; Tue, 15 Jun 2021 23:21:20 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id A665F824999B for ; Wed, 16 Jun 2021 03:21:20 +0000 (UTC) X-FDA: 78258135270.01.4EB4A24 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf23.hostedemail.com (Postfix) with ESMTP id CB858A000151 for ; Wed, 16 Jun 2021 03:21:11 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 76404613DC; Wed, 16 Jun 2021 03:21:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813679; bh=FjSVK+oLKWCgQcLVc/yxw6eKOykcdskBrRZdDTQATJQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hNxV4V7z+Nf+3L/lZMKFuV3SJ66ggCJ+cspPEUtfvkee+klF0IxtUDpq9DSwO40te Y/+8Op9Fj4sCWiK65uZ5EZUH8vS1Ha4lEzTXSwI28v2CGFytvqvS37LLikVoN0cVG8 6jMich5fC5pKUGbDFMObsHD8RbVXsawbyksBNUDe2evE4KHmZh18FEckNI+KTuQEUp PWpa6XDbLpI7L05XpEjB5MM7gd//a/KYKlt2yWknMaftgoVrL2FGbzOCMguQqyJVD5 vN6uvOJUdkIa7BI3bE944plKIQ3YtdbreMjZpSiKDnYa1JLRGqwKU4HVbz8WVVuMfY WVLB2KuFIkidw== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , linuxppc-dev@lists.ozlabs.org, Nicholas Piggin , Mathieu Desnoyers , Peter Zijlstra Subject: [PATCH 6/8] powerpc/membarrier: Remove special barrier on mm switch Date: Tue, 15 Jun 2021 20:21:11 -0700 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=hNxV4V7z; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf23.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org X-Rspamd-Server: rspam02 X-Stat-Signature: px3saa1mz65qdo3a8urc6obtcjqqzix4 X-Rspamd-Queue-Id: CB858A000151 X-HE-Tag: 1623813671-745038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: powerpc did the following on some, but not all, paths through switch_mm_irqs_off(): /* * Only need the full barrier when switching between processes. * Barrier when switching from kernel to userspace is not * required here, given that it is implied by mmdrop(). Barrier * when switching from userspace to kernel is not needed after * store to rq->curr. */ if (likely(!(atomic_read(&next->membarrier_state) & (MEMBARRIER_STATE_PRIVATE_EXPEDITED | MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev)) return; This is puzzling: if !prev, then one might expect that we are switching from kernel to user, not user to kernel, which is inconsistent with the comment. But this is all nonsense, because the one and only caller would never have prev == NULL and would, in fact, OOPS if prev == NULL. In any event, this code is unnecessary, since the new generic membarrier_finish_switch_mm() provides the same barrier without arch help. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Cc: Nicholas Piggin Cc: Mathieu Desnoyers Cc: Peter Zijlstra Signed-off-by: Andy Lutomirski --- arch/powerpc/include/asm/membarrier.h | 27 --------------------------- arch/powerpc/mm/mmu_context.c | 2 -- 2 files changed, 29 deletions(-) delete mode 100644 arch/powerpc/include/asm/membarrier.h diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h deleted file mode 100644 index 6e20bb5c74ea..000000000000 --- a/arch/powerpc/include/asm/membarrier.h +++ /dev/null @@ -1,27 +0,0 @@ -#ifndef _ASM_POWERPC_MEMBARRIER_H -#define _ASM_POWERPC_MEMBARRIER_H - -static inline void membarrier_arch_switch_mm(struct mm_struct *prev, - struct mm_struct *next, - struct task_struct *tsk) -{ - /* - * Only need the full barrier when switching between processes. - * Barrier when switching from kernel to userspace is not - * required here, given that it is implied by mmdrop(). Barrier - * when switching from userspace to kernel is not needed after - * store to rq->curr. - */ - if (likely(!(atomic_read(&next->membarrier_state) & - (MEMBARRIER_STATE_PRIVATE_EXPEDITED | - MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev)) - return; - - /* - * The membarrier system call requires a full memory barrier - * after storing to rq->curr, before going back to user-space. - */ - smp_mb(); -} - -#endif /* _ASM_POWERPC_MEMBARRIER_H */ diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c index a857af401738..8daa95b3162b 100644 --- a/arch/powerpc/mm/mmu_context.c +++ b/arch/powerpc/mm/mmu_context.c @@ -85,8 +85,6 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, if (new_on_cpu) radix_kvm_prefetch_workaround(next); - else - membarrier_arch_switch_mm(prev, next, tsk); /* * The actual HW switching method differs between the various From patchwork Wed Jun 16 03:21:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53610C48BE8 for ; Wed, 16 Jun 2021 03:21:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F303561246 for ; Wed, 16 Jun 2021 03:21:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F303561246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E9DCC6B0075; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E28496B0078; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEE1F6B007B; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 944766B0075 for ; Tue, 15 Jun 2021 23:21:21 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 31A48180AD806 for ; Wed, 16 Jun 2021 03:21:21 +0000 (UTC) X-FDA: 78258136362.20.AC06FB4 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP id 85104F2 for ; Wed, 16 Jun 2021 03:21:07 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 2742F613C2; Wed, 16 Jun 2021 03:21:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813680; bh=uwtQcyv5VK7EnNeu4zAuSiFMzgWca2SQ0MXYCFdFfoE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DUx2c2QdVsJ/e6o1nIEcvcwpfMTTwRZITr/IRvy6cFj1GF8ZkgGPcr9dJhZ0Aahcj Db0jmOS2dBE11jCaQ94u8RnJ6pJzeI5ilMOqW96akpc7Cvs2/ytVSzbGwsBKmp3Rav x9sATIjBcy21giShX/AUB2ujLJV5m/EcwvZ9k6ouu4vHJvfF+Kt4yDD+0EUlgw92UQ Rha8tWvK2dwWtfEwCxxcGLfFBqPBd6JyRHbdy33OZPUUUtbHT9r6q8UGcDUU8UZaD8 rkDV+j+cu6M5kPmGGXrB1FyGOdj2y3fmZIqPlEim+yeXdHGWVovFbYqHPhBF5kbX0j I8AWloY7Mh75A== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Mathieu Desnoyers , Nicholas Piggin , Peter Zijlstra , Russell King , linux-arm-kernel@lists.infradead.org Subject: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE Date: Tue, 15 Jun 2021 20:21:12 -0700 Message-Id: <2142129092ff9aa00e600c42a26c4015b7f5ceec.1623813516.git.luto@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=DUx2c2Qd; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf12.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org X-Rspamd-Server: rspam02 X-Stat-Signature: 3ajujktyg8t8ebcoae8bpuef5pxe83qf X-Rspamd-Queue-Id: 85104F2 X-HE-Tag: 1623813667-333213 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On arm32, the only way to safely flush icache from usermode is to call cacheflush(2). This also handles any required pipeline flushes, so membarrier's SYNC_CORE feature is useless on arm. Remove it. Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Cc: Russell King Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Andy Lutomirski Acked-by: Russell King (Oracle) --- arch/arm/Kconfig | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 24804f11302d..89a885fba724 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,7 +10,6 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV - select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE select ARCH_HAS_PHYS_TO_DMA From patchwork Wed Jun 16 03:21:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andy Lutomirski X-Patchwork-Id: 12323819 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB48C49EA2 for ; Wed, 16 Jun 2021 03:21:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2FCC61246 for ; Wed, 16 Jun 2021 03:21:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2FCC61246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4FD96B0078; Tue, 15 Jun 2021 23:21:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFEC26B007B; Tue, 15 Jun 2021 23:21:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2BEA6B007D; Tue, 15 Jun 2021 23:21:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 7CBE36B0078 for ; Tue, 15 Jun 2021 23:21:27 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 25103181AC9B6 for ; Wed, 16 Jun 2021 03:21:27 +0000 (UTC) X-FDA: 78258136614.35.F7FF23E Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf05.hostedemail.com (Postfix) with ESMTP id 01961E000251 for ; Wed, 16 Jun 2021 03:21:13 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id D051F613C7; Wed, 16 Jun 2021 03:21:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623813681; bh=c1mW/zRWUMzz2nK2JXBUnfYw2Qug1l+u6NJy058tCKU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=B70I5bf52TGkG4s1VVE6Hd7Y7ayRYIqoQY+VCilqwGmR637HAkxETW5LDO2WUAU41 JPrMUMhJxEVryOCY1LLy6bILDYQkRx/Up5OfTCeWH1y8+ocLCbPZKDINW09hhWUMhf x/iKZE8FCbJ1rxisppx4MLiLiEzDxLiSKnuAbNbxuObVxTQGxiJsQwt8JbAwy86JJb 5d2zQrZFSt6N3/cbRkVui0tket2wwvCsLk4YOXvILUCJUWirQLDg8Frqib8KziXQLk ysBwfFSYxXKB/Jb9ME6MVI7PetyyF1uTaBSC8eXu4Av1hnsrGlkvApjo+ZhKswSnv/ m6AaPR7Pa2FhA== From: Andy Lutomirski To: x86@kernel.org Cc: Dave Hansen , LKML , linux-mm@kvack.org, Andrew Morton , Andy Lutomirski , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , linuxppc-dev@lists.ozlabs.org, Nicholas Piggin , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, Mathieu Desnoyers , Peter Zijlstra , stable@vger.kernel.org Subject: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation Date: Tue, 15 Jun 2021 20:21:13 -0700 Message-Id: <07a8b963002cb955b7516e61bad19514a3acaa82.1623813516.git.luto@kernel.org> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 01961E000251 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=B70I5bf5; spf=pass (imf05.hostedemail.com: domain of luto@kernel.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: 99o8rza5befpweu1s5hbum7m4p4hu974 X-HE-Tag: 1623813673-251592 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The old sync_core_before_usermode() comments suggested that a non-icache-syncing return-to-usermode instruction is x86-specific and that all other architectures automatically notice cross-modified code on return to userspace. This is misleading. The incantation needed to modify code from one CPU and execute it on another CPU is highly architecture dependent. On x86, according to the SDM, one must modify the code, issue SFENCE if the modification was WC or nontemporal, and then issue a "serializing instruction" on the CPU that will execute the code. membarrier() can do the latter. On arm64 and powerpc, one must flush the icache and then flush the pipeline on the target CPU, although the CPU manuals don't necessarily use this language. So let's drop any pretense that we can have a generic way to define or implement membarrier's SYNC_CORE operation and instead require all architectures to define the helper and supply their own documentation as to how to use it. This means x86, arm64, and powerpc for now. Let's also rename the function from sync_core_before_usermode() to membarrier_sync_core_before_usermode() because the precise flushing details may very well be specific to membarrier, and even the concept of "sync_core" in the kernel is mostly an x86-ism. (It may well be the case that, on real x86 processors, synchronizing the icache (which requires no action at all) and "flushing the pipeline" is sufficient, but trying to use this language would be confusing at best. LFENCE does something awfully like "flushing the pipeline", but the SDM does not permit LFENCE as an alternative to a "serializing instruction" for this purpose.) Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Cc: Nicholas Piggin Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org Cc: Mathieu Desnoyers Cc: Nicholas Piggin Cc: Peter Zijlstra Cc: x86@kernel.org Cc: stable@vger.kernel.org Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE") Signed-off-by: Andy Lutomirski Acked-by: Nicholas Piggin Acked-by: Will Deacon --- .../membarrier-sync-core/arch-support.txt | 68 ++++++------------- arch/arm64/include/asm/sync_core.h | 19 ++++++ arch/powerpc/include/asm/sync_core.h | 14 ++++ arch/x86/Kconfig | 1 - arch/x86/include/asm/sync_core.h | 7 +- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/cpu/mce/core.c | 2 +- arch/x86/mm/tlb.c | 3 +- drivers/misc/sgi-gru/grufault.c | 2 +- drivers/misc/sgi-gru/gruhandles.c | 2 +- drivers/misc/sgi-gru/grukservices.c | 2 +- include/linux/sched/mm.h | 1 - include/linux/sync_core.h | 21 ------ init/Kconfig | 3 - kernel/sched/membarrier.c | 15 ++-- 15 files changed, 75 insertions(+), 87 deletions(-) create mode 100644 arch/arm64/include/asm/sync_core.h create mode 100644 arch/powerpc/include/asm/sync_core.h delete mode 100644 include/linux/sync_core.h diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt index 883d33b265d6..41c9ebcb275f 100644 --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt @@ -5,51 +5,25 @@ # # Architecture requirements # -# * arm/arm64/powerpc # -# Rely on implicit context synchronization as a result of exception return -# when returning from IPI handler, and when returning to user-space. -# -# * x86 -# -# x86-32 uses IRET as return from interrupt, which takes care of the IPI. -# However, it uses both IRET and SYSEXIT to go back to user-space. The IRET -# instruction is core serializing, but not SYSEXIT. -# -# x86-64 uses IRET as return from interrupt, which takes care of the IPI. -# However, it can return to user-space through either SYSRETL (compat code), -# SYSRETQ, or IRET. -# -# Given that neither SYSRET{L,Q}, nor SYSEXIT, are core serializing, we rely -# instead on write_cr3() performed by switch_mm() to provide core serialization -# after changing the current mm, and deal with the special case of kthread -> -# uthread (temporarily keeping current mm into active_mm) by issuing a -# sync_core_before_usermode() in that specific case. -# - ----------------------- - | arch |status| - ----------------------- - | alpha: | TODO | - | arc: | TODO | - | arm: | ok | - | arm64: | ok | - | csky: | TODO | - | h8300: | TODO | - | hexagon: | TODO | - | ia64: | TODO | - | m68k: | TODO | - | microblaze: | TODO | - | mips: | TODO | - | nds32: | TODO | - | nios2: | TODO | - | openrisc: | TODO | - | parisc: | TODO | - | powerpc: | ok | - | riscv: | TODO | - | s390: | TODO | - | sh: | TODO | - | sparc: | TODO | - | um: | TODO | - | x86: | ok | - | xtensa: | TODO | - ----------------------- +# An architecture that wants to support +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE needs to define precisely what it +# is supposed to do and implement membarrier_sync_core_before_usermode() to +# make it do that. Then it can select ARCH_HAS_MEMBARRIER_SYNC_CORE via +# Kconfig.Unfortunately, MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is not a +# fantastic API and may not make sense on all architectures. Once an +# architecture meets these requirements, +# +# On x86, a program can safely modify code, issue +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, and then execute that code, via +# the modified address or an alias, from any thread in the calling process. +# +# On arm64, a program can modify code, flush the icache as needed, and issue +# MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE to force a "context synchronizing +# event", aka pipeline flush on all CPUs that might run the calling process. +# Then the program can execute the modified code as long as it is executed +# from an address consistent with the icache flush and the CPU's cache type. +# +# On powerpc, a program can use MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE +# similarly to arm64. It would be nice if the powerpc maintainers could +# add a more clear explanantion. diff --git a/arch/arm64/include/asm/sync_core.h b/arch/arm64/include/asm/sync_core.h new file mode 100644 index 000000000000..74996bf533bb --- /dev/null +++ b/arch/arm64/include/asm/sync_core.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_ARM64_SYNC_CORE_H +#define _ASM_ARM64_SYNC_CORE_H + +#include + +/* + * On arm64, anyone trying to use membarrier() to handle JIT code is + * required to first flush the icache and then do SYNC_CORE. All that's + * needed after the icache flush is to execute a "context synchronization + * event". Right now, ERET does this, and we are guaranteed to ERET before + * any user code runs. If Linux ever programs the CPU to make ERET stop + * being a context synchronizing event, then this will need to be adjusted. + */ +static inline void membarrier_sync_core_before_usermode(void) +{ +} + +#endif /* _ASM_ARM64_SYNC_CORE_H */ diff --git a/arch/powerpc/include/asm/sync_core.h b/arch/powerpc/include/asm/sync_core.h new file mode 100644 index 000000000000..589fdb34beab --- /dev/null +++ b/arch/powerpc/include/asm/sync_core.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_SYNC_CORE_H +#define _ASM_POWERPC_SYNC_CORE_H + +#include + +/* + * XXX: can a powerpc person put an appropriate comment here? + */ +static inline void membarrier_sync_core_before_usermode(void) +{ +} + +#endif /* _ASM_POWERPC_SYNC_CORE_H */ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0045e1b44190..f010897a1e8a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -89,7 +89,6 @@ config X86 select ARCH_HAS_SET_DIRECT_MAP select ARCH_HAS_STRICT_KERNEL_RWX select ARCH_HAS_STRICT_MODULE_RWX - select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAS_DEBUG_WX diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index ab7382f92aff..c665b453969a 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -89,11 +89,10 @@ static inline void sync_core(void) } /* - * Ensure that a core serializing instruction is issued before returning - * to user-mode. x86 implements return to user-space through sysexit, - * sysrel, and sysretq, which are not core serializing. + * Ensure that the CPU notices any instruction changes before the next time + * it returns to usermode. */ -static inline void sync_core_before_usermode(void) +static inline void membarrier_sync_core_before_usermode(void) { /* With PTI, we unconditionally serialize before running user code. */ if (static_cpu_has(X86_FEATURE_PTI)) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 6974b5174495..52ead5f4fcdc 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index bf7fe87a7e88..4a577980d4d1 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -41,12 +41,12 @@ #include #include #include -#include #include #include #include #include +#include #include #include #include diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 59488d663e68..35b622fd2ed1 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -538,7 +539,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, */ if (unlikely(atomic_read(&next->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)) - sync_core_before_usermode(); + membarrier_sync_core_before_usermode(); #endif return; diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c index 723825524ea0..48fd5b101de1 100644 --- a/drivers/misc/sgi-gru/grufault.c +++ b/drivers/misc/sgi-gru/grufault.c @@ -20,8 +20,8 @@ #include #include #include -#include #include +#include #include "gru.h" #include "grutables.h" #include "grulib.h" diff --git a/drivers/misc/sgi-gru/gruhandles.c b/drivers/misc/sgi-gru/gruhandles.c index 1d75d5e540bc..c8cba1c1b00f 100644 --- a/drivers/misc/sgi-gru/gruhandles.c +++ b/drivers/misc/sgi-gru/gruhandles.c @@ -16,7 +16,7 @@ #define GRU_OPERATION_TIMEOUT (((cycles_t) local_cpu_data->itc_freq)*10) #define CLKS2NSEC(c) ((c) *1000000000 / local_cpu_data->itc_freq) #else -#include +#include #include #define GRU_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000) #define CLKS2NSEC(c) ((c) * 1000000 / tsc_khz) diff --git a/drivers/misc/sgi-gru/grukservices.c b/drivers/misc/sgi-gru/grukservices.c index 0ea923fe6371..ce03ff3f7c3a 100644 --- a/drivers/misc/sgi-gru/grukservices.c +++ b/drivers/misc/sgi-gru/grukservices.c @@ -16,10 +16,10 @@ #include #include #include -#include #include #include #include +#include #include #include "gru.h" #include "grulib.h" diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index c6eebbafadb0..845db11190cd 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -7,7 +7,6 @@ #include #include #include -#include /* * Routines for handling mm_structs diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h deleted file mode 100644 index 013da4b8b327..000000000000 --- a/include/linux/sync_core.h +++ /dev/null @@ -1,21 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_SYNC_CORE_H -#define _LINUX_SYNC_CORE_H - -#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE -#include -#else -/* - * This is a dummy sync_core_before_usermode() implementation that can be used - * on all architectures which return to user-space through core serializing - * instructions. - * If your architecture returns to user-space through non-core-serializing - * instructions, you need to write your own functions. - */ -static inline void sync_core_before_usermode(void) -{ -} -#endif - -#endif /* _LINUX_SYNC_CORE_H */ - diff --git a/init/Kconfig b/init/Kconfig index 1ea12c64e4c9..e5d552b0823e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -2377,9 +2377,6 @@ source "kernel/Kconfig.locks" config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE bool -config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE - bool - # It may be useful for an architecture to override the definitions of the # SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in # and the COMPAT_ variants in , in particular to use a diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index c32c32a2441e..f72a6ab3fac2 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -5,6 +5,9 @@ * membarrier system call */ #include "sched.h" +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE +#include +#endif /* * The basic principle behind the regular memory barrier mode of membarrier() @@ -221,6 +224,7 @@ static void ipi_mb(void *info) smp_mb(); /* IPIs should be serializing but paranoid. */ } +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE static void ipi_sync_core(void *info) { /* @@ -230,13 +234,14 @@ static void ipi_sync_core(void *info) * the big comment at the top of this file. * * A sync_core() would provide this guarantee, but - * sync_core_before_usermode() might end up being deferred until - * after membarrier()'s smp_mb(). + * membarrier_sync_core_before_usermode() might end up being deferred + * until after membarrier()'s smp_mb(). */ smp_mb(); /* IPIs should be serializing but paranoid. */ - sync_core_before_usermode(); + membarrier_sync_core_before_usermode(); } +#endif static void ipi_rseq(void *info) { @@ -368,12 +373,14 @@ static int membarrier_private_expedited(int flags, int cpu_id) smp_call_func_t ipi_func = ipi_mb; if (flags == MEMBARRIER_FLAG_SYNC_CORE) { - if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) +#ifndef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE return -EINVAL; +#else if (!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) return -EPERM; ipi_func = ipi_sync_core; +#endif } else if (flags == MEMBARRIER_FLAG_RSEQ) { if (!IS_ENABLED(CONFIG_RSEQ)) return -EINVAL;