From patchwork Mon Oct 21 04:22:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13843543 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 633EBD3C93E for ; Mon, 21 Oct 2024 04:22:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4ECC6B0092; Mon, 21 Oct 2024 00:22:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AFDF16B0093; Mon, 21 Oct 2024 00:22:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 928F16B0095; Mon, 21 Oct 2024 00:22:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6E55B6B0092 for ; Mon, 21 Oct 2024 00:22:35 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F1D61A0E91 for ; Mon, 21 Oct 2024 04:22:07 +0000 (UTC) X-FDA: 82696312902.28.F41F0FA Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf12.hostedemail.com (Postfix) with ESMTP id B1FE04000E for ; Mon, 21 Oct 2024 04:22:26 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QLln8vYr; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3CNcVZwYKCE0D9Ewp3v33v0t.r310x29C-11zAprz.36v@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3CNcVZwYKCE0D9Ewp3v33v0t.r310x29C-11zAprz.36v@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729484503; a=rsa-sha256; cv=none; b=wnXVvXMYWz0vEpQwwmhu2BvMl/OoTl4fxBhLrZpTqQ5Mx9EZ3kJUXTSf0XxnYZqQ8qqPTy cZ9widnRwFoiYr74kNN57rOICqaVE4DcwXQS2fDmwK3xAKMu9sjmjBsHa0EIKHVpiEbch2 cNlTb2ca2OitmLW23BVuYB1z/3enaN8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QLln8vYr; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 3CNcVZwYKCE0D9Ewp3v33v0t.r310x29C-11zAprz.36v@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3CNcVZwYKCE0D9Ewp3v33v0t.r310x29C-11zAprz.36v@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729484503; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KJDQv9BsbNqPzAGIVHBRqdKJMgT8Ksx982Co8Pglskw=; b=sSo0ysh4vmRbjJHbg0S03cNF8vEZN0LS5wFsZeLHX0pqgzdPGk8IrZK8IgDUHfKuqPireT WpinQZPrxNLh7X8IN7XayuZCjGGuNy3qZu2MQDfo5sCiUtegDmI03KaVmk0jMpstq3v00X Wr73QMTBjdijKdM8KlxTFBY2xgys4EU= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6e3c638cc27so68212947b3.0 for ; Sun, 20 Oct 2024 21:22:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729484552; x=1730089352; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KJDQv9BsbNqPzAGIVHBRqdKJMgT8Ksx982Co8Pglskw=; b=QLln8vYr2JTUk/Mr/30ZYBQwucENSjMn5LmGItj6oUWvW+051UlP1t9mJH0xaBmuWB dqDS73D2qkB4xSfrDjMatD7AtvVn8xaPHHknl01GBoMEPjd9Up8JScx+bzWSz2067sHK oqrUzetghU6kvNrGLoOjp+/RNybS6B3jxJL52Tf+nuDlulYxpuA4oJxyXXlhdjmVPk2x 0A3UvjOwtTJD9MNu9R9jeGxMYX7q/Y8NoZBUDwR8loNt4Ax1HunNACyMxOdtFmFoxxse P2EBkGfiEJUVIDBgVirZnPSZG0GKL79BDTmMwcLR4olfnqEL1jLdeCmFaN64X+RdD++t blKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729484552; x=1730089352; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KJDQv9BsbNqPzAGIVHBRqdKJMgT8Ksx982Co8Pglskw=; b=Zm7lNcrmRnL+FDblnauBzjamS+4k1fu6hi8y+DUiW2LZTNdd96dA6grU7NN1VkBZ2E zzmpDOiCXN9wSj1p/76kAdj2PVCMLdqt24NQouiyvVweoSDyWuJHRU0IfACYWrqKefen 7VPMBZ18JC1aOHJlL1+FVA58azRTtBnUIyUVHpKvDWT+8XV+zYcH/oUzKcUHqcIC5RD6 xCxaN1B2zUNjYLq6f+KoxWRCukORLSEl9AnpkC0wUDjAcRnpU2+V1gvd8QAc/bYrHioA 7FEL3F+0VVoX+oCdc700n7i1j3jOVK5Be0hGgYZYy67IYoaB1MQFnAM8BIkfb9muQ5pn AeAA== X-Forwarded-Encrypted: i=1; AJvYcCUj4hAq40J8dUsmIUXeSwthi1NswNfLieVx/5CUxzSPL/X1u5XB9yQvpxQIBNq9kLzLdu7zjJbJaw==@kvack.org X-Gm-Message-State: AOJu0YyX49cPpYtLySuxm1a5FXtnChgXxYRFdrdaoyM8GM/a/AMmVBNw MOdWwcZCNzo8QhJJYuuYBz2htFeG8CoED0V7YUzNfUE7MuqlZk93LTRRy2uboDqhlYnNnOQR7l8 kQA== X-Google-Smtp-Source: AGHT+IGEZGtfLFTAo+zlX7imoG9sifk97tpSM8aK6pq0aGlMOVsYXw6fthvIZGCv/EPFEWt1vDf5ZDyKg70= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:1569:9ef4:20ab:abf9]) (user=yuzhao job=sendgmr) by 2002:a05:690c:fc2:b0:6e2:70e:e82e with SMTP id 00721157ae682-6e5bfc0c757mr2067897b3.6.1729484552429; Sun, 20 Oct 2024 21:22:32 -0700 (PDT) Date: Sun, 20 Oct 2024 22:22:16 -0600 In-Reply-To: <20241021042218.746659-1-yuzhao@google.com> Mime-Version: 1.0 References: <20241021042218.746659-1-yuzhao@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241021042218.746659-5-yuzhao@google.com> Subject: [PATCH v1 4/6] arm64: broadcast IPIs to pause remote CPUs From: Yu Zhao To: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Will Deacon Cc: Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Yu Zhao X-Rspam-User: X-Rspamd-Queue-Id: B1FE04000E X-Rspamd-Server: rspam01 X-Stat-Signature: ibkq1teetysghmgyi7cnkoky8nsbg3qp X-HE-Tag: 1729484546-827453 X-HE-Meta: U2FsdGVkX18scOwQuyiVLF9K+3Z1V49HS121bd75Z/r5bOA0f+B4+z6pwVz+C3MQKhz++WaN6+r2hFIv6MgIHA6bDybHOOJcLTrckDYn8RDS9KogO5BKZF+s/+we1160o6QA9Y1ZghLlF1W4NLguh29SqjE26vVpXTKNdi2uhbBpzJowW5rhQ8jhxniMo8vknNsfXCkfrJAA2QaPMUneyaI/3upNNxLxLU3PauemmUzSHwTSPSDB3Rf9O83jgjTdR2/0eD7c44KyBPrtC7H+cydj+3AKRPgcwVgFkYV2ZD+FoKKfjY4LYbFgfkmNXHZcTD8TV4a3FJB8B9HJL+czhlu6VSsVeTR6xNL8GDUTncsrhOi+p/6cCanO6AZ86MSJHIoBA9I5VNGlbHwixCfP3KE8S2Yy0+Dhhz94Lp36z4DQhMZi6ArawxOuKGuCNJjGxEOclLr19LM58osSNtO4hYjVF4SqtxDOvvSGpBTJw6q7WPJj29AOuVwLB2lgcQuZiIcgmAr4Um6h63T85o5+mFXgGjpPmutiVWAFKT2+uB1cuM+rAzB8Z9Nkd7eqE5mcLbcQQYQmXOMcqNO3fm/AGLT5GgiINyQq0nuFmMB23Q/PYFwYA6Er0q6ge12SQ7tWPmo7Mwga5Ot0feHx3aGBCgoBaZl3fyHxkNW6j5m+H4n+d6ep9chZnUv0DhdTjoypSm0ZrYZ+pq8nh3Z6tBHXcJ+bKxhuCJQz2Wdc8tS+Vnx6Vafooc8cbzptWs3O2dAiihLBmx0npR9kj1Skn05aBl2pZiG41vpAFGqVK6Nv0t501uIEpna+C00FQsSuDoUknZSd4g1mR5ZL+G6VSQuYftH4ZP4WreJALcaCMsWDAZFkKOK5Y6EITFel4oBPZ4sgJWgKU7THeKRnkuXGqasfykxGHDrvyI+0lxGRX4ygrdlyFaqKL7OWHTE9n/hov4TL+roScIYIw0eX8kXyR30 oZQYX4eV ZZkvX9oRlTD6BalCdyfcqgKB+kTFM1rIFKnIgfIUx1ddArWU3hP5coDoWAMOVm6t+DLsGd2H/3wu3OxfGk20/Ub2yKrAbkWlqbO+/t5C/DorQfXuFkhCOea4enXq8jZU4uAtKBczp52fr2/oYlfTB/rqbXxOTvntRkWQwK+Ki5OSn3FldMFM/YcTD2vgqxRdz10NC/Hem2pTIXo1MNCYxW5bwNlZW/P7p0mq6KGUWPMnREBHYn5OCP3MT0NMDswy9ZIiNee2mS4NjRaKudze9uClZI5406x+rrzSVqN2sIbJxTcsKxow/HiKucSx5cVfbZpVG1dGcTd+cKb/szFDkpnT+x6VvHSWswo+G1B5ALKJl45fKDPQ0zqSREIeeEzD31suZpjwxDOMAa/6TFQSI4u0Er54IWGWvj6KOoBZcy07mv7c8m/TW2z1S0/e52rix3dA6YAkJ8VonP8PN7fh+pQb2MRhsEspb0rNy0zwBnSnJ/YjJTFV/wXM0jvXwy/cyL7Sts4I7eshd68Aoc8NHBIOWALvkMqGySX5RK0y7y2tx5XY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Broadcast pseudo-NMI IPIs to pause remote CPUs for a short period of time, and then reliably resume them when the local CPU exits critical sections that preclude the execution of remote CPUs. A typical example of such critical sections is BBM on kernel PTEs. HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the folllowing reason: This is deemed UNPREDICTABLE by the Arm architecture without a break-before-make sequence (make the PTE invalid, TLBI, write the new valid PTE). However, such sequence is not possible since the vmemmap may be concurrently accessed by the kernel. Supporting BBM on kernel PTEs is one of the approaches that can make HVO theoretically safe on arm64. Note that it is still possible for the paused CPUs to perform speculative translations. Such translations would cause spurious kernel PFs, which should be properly handled by is_spurious_el1_translation_fault(). Signed-off-by: Yu Zhao --- arch/arm64/include/asm/smp.h | 3 ++ arch/arm64/kernel/smp.c | 92 +++++++++++++++++++++++++++++++++--- 2 files changed, 88 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 2510eec026f7..cffb0cfed961 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); +void pause_remote_cpus(void); +void resume_remote_cpus(void); + #endif /* ifndef __ASSEMBLY__ */ #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 3b3f6b56e733..68829c6de1b1 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -85,7 +85,12 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; -static bool crash_stop; +enum { + SEND_STOP = BIT(0), + CRASH_STOP = BIT(1), +}; + +static unsigned long stop_in_progress; static void ipi_setup(int cpu); @@ -917,6 +922,79 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs #endif } +static DEFINE_SPINLOCK(cpu_pause_lock); +static cpumask_t paused_cpus; +static cpumask_t resumed_cpus; + +static void pause_local_cpu(void) +{ + int cpu = smp_processor_id(); + + cpumask_clear_cpu(cpu, &resumed_cpus); + /* + * Paired with pause_remote_cpus() to confirm that this CPU not only + * will be paused but also can be reliably resumed. + */ + smp_wmb(); + cpumask_set_cpu(cpu, &paused_cpus); + /* paused_cpus must be set before waiting on resumed_cpus. */ + barrier(); + while (!cpumask_test_cpu(cpu, &resumed_cpus)) + cpu_relax(); + /* A typical example for sleep and wake-up functions. */ + smp_mb(); + cpumask_clear_cpu(cpu, &paused_cpus); +} + +void pause_remote_cpus(void) +{ + cpumask_t cpus_to_pause; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + cpumask_copy(&cpus_to_pause, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_pause); + + spin_lock(&cpu_pause_lock); + + WARN_ON_ONCE(!cpumask_empty(&paused_cpus)); + + smp_cross_call(&cpus_to_pause, IPI_CPU_STOP_NMI); + + while (!cpumask_equal(&cpus_to_pause, &paused_cpus)) + cpu_relax(); + /* + * Paired with pause_local_cpu() to confirm that all CPUs not only will + * be paused but also can be reliably resumed. + */ + smp_rmb(); + WARN_ON_ONCE(cpumask_intersects(&cpus_to_pause, &resumed_cpus)); + + spin_unlock(&cpu_pause_lock); +} + +void resume_remote_cpus(void) +{ + cpumask_t cpus_to_resume; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + cpumask_copy(&cpus_to_resume, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_resume); + + spin_lock(&cpu_pause_lock); + + cpumask_setall(&resumed_cpus); + /* A typical example for sleep and wake-up functions. */ + smp_mb(); + while (cpumask_intersects(&cpus_to_resume, &paused_cpus)) + cpu_relax(); + + spin_unlock(&cpu_pause_lock); +} + static void arm64_backtrace_ipi(cpumask_t *mask) { __ipi_send_mask(ipi_desc[IPI_CPU_BACKTRACE], mask); @@ -970,7 +1048,9 @@ static void do_handle_IPI(int ipinr) case IPI_CPU_STOP: case IPI_CPU_STOP_NMI: - if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { + if (!test_bit(SEND_STOP, &stop_in_progress)) { + pause_local_cpu(); + } else if (test_bit(CRASH_STOP, &stop_in_progress)) { ipi_cpu_crash_stop(cpu, get_irq_regs()); unreachable(); } else { @@ -1142,7 +1222,6 @@ static inline unsigned int num_other_online_cpus(void) void smp_send_stop(void) { - static unsigned long stop_in_progress; cpumask_t mask; unsigned long timeout; @@ -1154,7 +1233,7 @@ void smp_send_stop(void) goto skip_ipi; /* Only proceed if this is the first CPU to reach this code */ - if (test_and_set_bit(0, &stop_in_progress)) + if (test_and_set_bit(SEND_STOP, &stop_in_progress)) return; /* @@ -1230,12 +1309,11 @@ void crash_smp_send_stop(void) * This function can be called twice in panic path, but obviously * we execute this only once. * - * We use this same boolean to tell whether the IPI we send was a + * We use the CRASH_STOP bit to tell whether the IPI we send was a * stop or a "crash stop". */ - if (crash_stop) + if (test_and_set_bit(CRASH_STOP, &stop_in_progress)) return; - crash_stop = 1; smp_send_stop();