From patchwork Tue Aug 6 02:21:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13754341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28EBBC3DA7F for ; Tue, 6 Aug 2024 02:21:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B50A36B00C1; Mon, 5 Aug 2024 22:21:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD9AF6B00C2; Mon, 5 Aug 2024 22:21:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 952B16B00C3; Mon, 5 Aug 2024 22:21:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7202B6B00C1 for ; Mon, 5 Aug 2024 22:21:27 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2325D1208A3 for ; Tue, 6 Aug 2024 02:21:27 +0000 (UTC) X-FDA: 82420219014.20.5B2A147 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf02.hostedemail.com (Postfix) with ESMTP id 5C8EF80019 for ; Tue, 6 Aug 2024 02:21:25 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=f9G5LQF0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3pIixZgYKCLcvrweXldlldib.Zljifkru-jjhsXZh.lod@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3pIixZgYKCLcvrweXldlldib.Zljifkru-jjhsXZh.lod@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722910834; a=rsa-sha256; cv=none; b=BvtGgjaVHvfFdNunHtRvmJxwpKqNGSJpKsC8dqKKRd03Iz2GE1YXf7RQHzINvA5gxhj79J rrpsV3BD8kS/GQfB5SPJym3I4vgqkywdDZo+Eo0vMzW+09vGdQo7SfIB1mkocqqWiV7fHK iAl3XX3JXS7AjAPEiAPbwZthtDGj9a0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=f9G5LQF0; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3pIixZgYKCLcvrweXldlldib.Zljifkru-jjhsXZh.lod@flex--yuzhao.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3pIixZgYKCLcvrweXldlldib.Zljifkru-jjhsXZh.lod@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722910834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O2mshx39uv2zkvLi95oIpi0tXJ9/QGv1zrpIuDCw18w=; b=1bgjzXDIYiDsxkbApF9Bwu3OuhdcQeDkJb9UK6OrQUp+l/XejWA2hFrV0huou24+5vM/Yv gbdwEgEXp6H/tI4ZMmnGeqLLVl/lmGa/Vx1wR1VAXO65fbYcbk/AXO4OnQurpIxN7VrIzH Cs2Vh4j33lc86zmIW5vZ2+LfYfDw6zE= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6643016423fso4214647b3.3 for ; Mon, 05 Aug 2024 19:21:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1722910884; x=1723515684; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O2mshx39uv2zkvLi95oIpi0tXJ9/QGv1zrpIuDCw18w=; b=f9G5LQF0CHdgQR/dLOvxx3lT71UxTXlqbb2g25/Z/sW9EUtBXiVXjkyygnQvGQGAEM 3vWZeewjriAufEybuY2RVWOxtHN2I0Aw4nh2ODtnFjOxiKHFyuwbn7ADhWRQ5R/fXhdq XxQxh8YVShwB6fGPbA3Ex6lb3aWYgCPc6mcRr76phQS7zVULUmXH8nWZaCWEuelocOhP qdrax1eVCmUGlgH76FWKYEEYcab889flxFqNwQ+R3EXAUOVmOAJlxHwy6j6jYbgap5fq UPR2C3QZkJ0CT9UGxobdEnPdrn8obarpqKWFDtK6JHRvvlOPOKDFo4LTE415lz5L4PL6 2qUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722910884; x=1723515684; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O2mshx39uv2zkvLi95oIpi0tXJ9/QGv1zrpIuDCw18w=; b=KJRWZ4TqJk9ODVLVe8FVCCHrFRL7xVZQF5OXO37S5nIjG38NXdY93eT87Ya1/B/PAH ZPIZqIRHpJaY2IO2hwXB4R1fPefiWxKs8cpC+zuADeRfdIg2Bs4T/dl9AE7+9Adsv17w fXDN8fwPNISuIn7nLwNXWmiUg9wC2f3E0C05pzZ8f5G7uYWZ0dvzaYh8v1dWSVpGXY4E JagnsAtgJ8eR6Jqj8a4BA2ZpiqvwqdY2eyIEKdZCxfhMv/S3rdFE9aJvOQC+xUxMbimo A2hcCrYEGcSlmqE0/lFjfvaz6aovLWEPh9wpliR5gp36JTsfLKc617z8SCOz7kXfu31h FkaA== X-Forwarded-Encrypted: i=1; AJvYcCV9GgzLJP0X2iaKBRoYHv2UFI5iu05ROlGiy3TID1Pobiegl9burUBE19B6Zy/w4ngjjfIjmZxQa7xgGtAFrnh8NRM= X-Gm-Message-State: AOJu0YwB3D44mLYJpnhrT+kQvXHXOM7/8fqXvodgrqvApX20nMqtQG+K 5Gw1R/8EAlvmykTodFsQbt6ihUyjWISXB852E6j3ZRxz7pVMYIonNIfkgLxvPmkXXuBLtqVncjf 3zQ== X-Google-Smtp-Source: AGHT+IG70ROskvb4vGAVk7HFhpn0VwxmpNbq5+lHHNoIlP/q1TqLemnSQjn2n7kwU3OW0QixZ2DVJNXlB8A= X-Received: from yuzhao2.bld.corp.google.com ([2a00:79e0:2e28:6:261c:802b:6b55:e09c]) (user=yuzhao job=sendgmr) by 2002:a05:690c:288:b0:673:b39a:9291 with SMTP id 00721157ae682-6895ffbd30dmr4163187b3.3.1722910884345; Mon, 05 Aug 2024 19:21:24 -0700 (PDT) Date: Mon, 5 Aug 2024 20:21:12 -0600 In-Reply-To: <20240806022114.3320543-1-yuzhao@google.com> Mime-Version: 1.0 References: <20240806022114.3320543-1-yuzhao@google.com> X-Mailer: git-send-email 2.46.0.rc2.264.g509ed76dc8-goog Message-ID: <20240806022114.3320543-3-yuzhao@google.com> Subject: [RFC PATCH 2/4] arm64: use IPIs to pause/resume remote CPUs From: Yu Zhao To: Catalin Marinas , Will Deacon Cc: Andrew Morton , David Rientjes , Douglas Anderson , Frank van der Linden , Mark Rutland , Muchun Song , Nanyong Sun , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5C8EF80019 X-Stat-Signature: mrrojqabnabk6mxx1serjw1f7yp9fzi8 X-Rspam-User: X-HE-Tag: 1722910885-481938 X-HE-Meta: U2FsdGVkX1+wnP9tkDwGW798nolkR1oYGlvtkUygDKbRg44XeP931lqXwp/ac1iZ5EPgD3P/MIllPIs9608BTn2XoHSJQxq3AYXyKooKmyIE2o2UErlHgFg0Q2BgYxVAQWTu4Z/8cYez2KSnT0shEmJOZMEFRr63Tr8v2twsbyyNzY49tTUF0QY03vN40jJia3ZMiFlFtGI0vPfsTcn9BMISXduXXZg1OBXkYMh5fJC4E1UPX17H/U3P0hBLPaCxAKSwmTa7Euz3Tm4V+r3sa/QaFMJjZ2sGRmv/63edWrUVtwomTWLgrYY845M5ABqhrIbpv/MVcZmR6qKWVILCfu5bheHkg2oIrTsxnqtTQDaEaPQejZtNGY6oRt49iJiJ5xrpRAGQJX3MTFwUEcuHOzCTf6LWALxbXWyb3ZPka/FoEq/HDSF0OXOeo8zfkk7Qd1ZG2Imsh1FsGsXq0CoeJlxHDzgR+Osix/UzZk2HQxnQjRBNeWN/wywgZj6gRxfO6GIeNFez3b0aqfptBIAE8I5kidqgE65RDk0nKV+lLTiOxL25L8xBqD+jv4LvoVT57pmejAtUU/9g1Ddx9qSV8GQr4C6YXMKzPB4OgEcyW8l1VE8H0m7ZStNdtJJFz+y2GmEZ1WNM2XmESqWrRq/pIWsdwN6RJECaUIdHvum0P0Qcm5CLpi6vImvlISKw90mMaY5eq7n+ZwSBHkt/t1+8457BjqLZlo/JZ50/D+0OXC6BeG8mVMMjLrl5OTtdbj67fqJiqK9IbQdHZ7pGpFNt/kXAPPhCDYDUkYjfk0BMJMVkVv2woxSReWg1YwnR9gQL23M56F4Pt9cW+7huBnEoCn/sqNkEHvRF4EIanSvZLkanD9ve13xapS8unq8H8krD+HTy+Vc7Uxlwpyc3UsLHAf1BYvZwg+zsYxySzhLYj5XYnBnglGnAX6SVkDw1ngq/PHRORuOqYwKJAqlHrXd 2WLqp9wc HcydP+/3HgWCpgWTjfkhfMxK6w0vNxs+wp+aPC+e8rjdBn3z2X7lYyjUM0d+Lmf0BYawkpx3mZ2W5NstCVxo5B3chlUA2e/aHZQHGnn5Ikiv0lo4b9aiQJjCyLFr2zgc0bCj3msWSdP8Mysom7TTIeax5Of5xnbcKvCQUF2u6IYx1J0YwgphipKHYgc5XkmNrPeeJXzXgiwg14+6zRjHrEFiJltgsXCv1nYAVcv03SwT5c/P8w/+I+AkxnoEiL3v/4pWZduDHU+J09g9uOoWA+Wmow8JUzd5rp9wVoKFsWwKuFBn+u+NTjQ932cB59bDzPDGROX7UOlA/ecyHj/KpSbysZ5u4BQU1Pbk+FeY7XZsR43cWs0iIYFWwklOBDevx1eKVj9ARlxRt97/nq036PweEY7fmETcQqmFchQG4c6dm+TkJWKPe/vSNj7iXgjimPFKr52nzAKWGa3EW2XdVgnh24hWMgQ5vyc9wpsBthjgw/nNW5ajoe68qmivCBi9SspZbAlIeF11LlOI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Use pseudo-NMI IPIs to pause remote CPUs for a short period of time, and then reliably resume them when the local CPU exits critical sections that preclude the execution of remote CPUs. A typical example of such critical sections is BBM on kernel PTEs. HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the folllowing reason: This is deemed UNPREDICTABLE by the Arm architecture without a break-before-make sequence (make the PTE invalid, TLBI, write the new valid PTE). However, such sequence is not possible since the vmemmap may be concurrently accessed by the kernel. Supporting BBM on kernel PTEs is one of the approaches that can potentially make arm64 support HVO. Signed-off-by: Yu Zhao --- arch/arm64/include/asm/smp.h | 3 + arch/arm64/kernel/smp.c | 110 +++++++++++++++++++++++++++++++++++ 2 files changed, 113 insertions(+) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 2510eec026f7..cffb0cfed961 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); +void pause_remote_cpus(void); +void resume_remote_cpus(void); + #endif /* ifndef __ASSEMBLY__ */ #endif /* ifndef __ASM_SMP_H */ diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 5e18fbcee9a2..aa80266e5c9d 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -68,16 +68,25 @@ enum ipi_msg_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, + IPI_CPU_PAUSE, +#ifdef CONFIG_KEXEC_CORE IPI_CPU_CRASH_STOP, +#endif +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST IPI_TIMER, +#endif +#ifdef CONFIG_IRQ_WORK IPI_IRQ_WORK, +#endif NR_IPI, /* * Any enum >= NR_IPI and < MAX_IPI is special and not tracable * with trace_ipi_* */ IPI_CPU_BACKTRACE = NR_IPI, +#ifdef CONFIG_KGDB IPI_KGDB_ROUNDUP, +#endif MAX_IPI }; @@ -821,11 +830,20 @@ static const char *ipi_types[MAX_IPI] __tracepoint_string = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", + [IPI_CPU_PAUSE] = "CPU pause interrupts", +#ifdef CONFIG_KEXEC_CORE [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", +#endif +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST [IPI_TIMER] = "Timer broadcast interrupts", +#endif +#ifdef CONFIG_IRQ_WORK [IPI_IRQ_WORK] = "IRQ work interrupts", +#endif [IPI_CPU_BACKTRACE] = "CPU backtrace interrupts", +#ifdef CONFIG_KGDB [IPI_KGDB_ROUNDUP] = "KGDB roundup interrupts", +#endif }; static void smp_cross_call(const struct cpumask *target, unsigned int ipinr); @@ -884,6 +902,85 @@ void __noreturn panic_smp_self_stop(void) local_cpu_stop(); } +static DEFINE_SPINLOCK(cpu_pause_lock); +static cpumask_t paused_cpus; +static cpumask_t resumed_cpus; + +static void pause_local_cpu(void) +{ + int cpu = smp_processor_id(); + + cpumask_clear_cpu(cpu, &resumed_cpus); + /* + * Paired with pause_remote_cpus() to confirm that this CPU not only + * will be paused but also can be reliably resumed. + */ + smp_wmb(); + cpumask_set_cpu(cpu, &paused_cpus); + /* A typical example for sleep and wake-up functions. */ + smp_mb(); + while (!cpumask_test_cpu(cpu, &resumed_cpus)) { + wfe(); + barrier(); + } + barrier(); + cpumask_clear_cpu(cpu, &paused_cpus); +} + +void pause_remote_cpus(void) +{ + cpumask_t cpus_to_pause; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + cpumask_copy(&cpus_to_pause, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_pause); + + spin_lock(&cpu_pause_lock); + + WARN_ON_ONCE(!cpumask_empty(&paused_cpus)); + + smp_cross_call(&cpus_to_pause, IPI_CPU_PAUSE); + + while (!cpumask_equal(&cpus_to_pause, &paused_cpus)) { + cpu_relax(); + barrier(); + } + /* + * Paired with pause_local_cpu() to confirm that all CPUs not only will + * be paused but also can be reliably resumed. + */ + smp_rmb(); + WARN_ON_ONCE(cpumask_intersects(&cpus_to_pause, &resumed_cpus)); + + spin_unlock(&cpu_pause_lock); +} + +void resume_remote_cpus(void) +{ + cpumask_t cpus_to_resume; + + lockdep_assert_cpus_held(); + lockdep_assert_preemption_disabled(); + + cpumask_copy(&cpus_to_resume, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &cpus_to_resume); + + spin_lock(&cpu_pause_lock); + + cpumask_setall(&resumed_cpus); + /* A typical example for sleep and wake-up functions. */ + smp_mb(); + while (cpumask_intersects(&cpus_to_resume, &paused_cpus)) { + sev(); + cpu_relax(); + barrier(); + } + + spin_unlock(&cpu_pause_lock); +} + #ifdef CONFIG_KEXEC_CORE static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); #endif @@ -963,6 +1060,11 @@ static void do_handle_IPI(int ipinr) local_cpu_stop(); break; + case IPI_CPU_PAUSE: + pause_local_cpu(); + break; + +#ifdef CONFIG_KEXEC_CORE case IPI_CPU_CRASH_STOP: if (IS_ENABLED(CONFIG_KEXEC_CORE)) { ipi_cpu_crash_stop(cpu, get_irq_regs()); @@ -970,6 +1072,7 @@ static void do_handle_IPI(int ipinr) unreachable(); } break; +#endif #ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST case IPI_TIMER: @@ -991,9 +1094,11 @@ static void do_handle_IPI(int ipinr) nmi_cpu_backtrace(get_irq_regs()); break; +#ifdef CONFIG_KGDB case IPI_KGDB_ROUNDUP: kgdb_nmicallback(cpu, get_irq_regs()); break; +#endif default: pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr); @@ -1023,9 +1128,14 @@ static bool ipi_should_be_nmi(enum ipi_msg_type ipi) switch (ipi) { case IPI_CPU_STOP: + case IPI_CPU_PAUSE: +#ifdef CONFIG_KEXEC_CORE case IPI_CPU_CRASH_STOP: +#endif case IPI_CPU_BACKTRACE: +#ifdef CONFIG_KGDB case IPI_KGDB_ROUNDUP: +#endif return true; default: return false;