From patchwork Tue Jun 25 23:07:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Doug Anderson X-Patchwork-Id: 13712122 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B98A2C2BBCA for ; Tue, 25 Jun 2024 23:08:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=i+O2QBsMWjgEyrfD/XN6wSL85FpHr2KSp+eXJQFXgjU=; b=nuoidDfrHSAgt/YbCfbs+qvqD3 NzTDbv7IXplI996vh7g+kmSRHLdDJV1hkjbR5W4eWPbVkjALoWEhbwfKrkMCdBZ1pVMYs94581Eg6 t5MXDhsC0eMcwGm7Wzg77ZO8APdCVoOx/CS7IEyaAyT17CiEpWeUcHeDzc/kSjmgd6FOpKeNs0YbW WvMZXFTM7MlytkmrTKDUsBCP9nlB7OUawmfR8gaL93vuXT6wXuGuQt0APEq6jC7PffwDDpkF6DLf5 jI9uy80d7lFGQgeG9/GztOYxpIwqPXMZNLCIS6ZinqbAwlP5JePNHyojRb0qDBbmUBIpRPJA7RU+p 8+EKq8NQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMFGj-00000004kjZ-3ogA; Tue, 25 Jun 2024 23:08:09 +0000 Received: from mail-pg1-x52d.google.com ([2607:f8b0:4864:20::52d]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sMFGc-00000004kiv-1w3E for linux-arm-kernel@lists.infradead.org; Tue, 25 Jun 2024 23:08:04 +0000 Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-709423bc2e5so4679176a12.0 for ; Tue, 25 Jun 2024 16:08:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1719356881; x=1719961681; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=i+O2QBsMWjgEyrfD/XN6wSL85FpHr2KSp+eXJQFXgjU=; b=jk7n8uF7F4PVODP8eSvdY35NPTYFm4qGrB0S9935ZTu4hvBSKk+VTOCfENm7PgTJfE Vl0jclgc6QOIZC2LeK88JKhfX/D7qhR8f0q6wHif5kGS/EZipx5wRvKy2UCJ5D9qPmXt qmijCv/2g1/5f9J5ia3Q6qDXDYjPT6X4Mt55w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719356881; x=1719961681; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i+O2QBsMWjgEyrfD/XN6wSL85FpHr2KSp+eXJQFXgjU=; b=cRByZ0ETlkX+uf+4nPIvTNwBk+r2QMxH5QbrWgHvrgC4IEOdGyNS1u4xr8xYD4JcWg TngXSg/koJYYcRPUJ8L954K03KRvr2gvDhzHR3nB9T+zLAJdwu7om2rhjun07Z8M1STA 2xDs4TLuAjFWoGPoP0Fs4adjfDj3uv2MujexkqXCM8/iDo0i2t+5KljPbrbV2B6/iohP kMDywA//2UrFurt1PMV0AFBbO8AO9sTp6R7vtONa4aD8YiIBJ6BSfNbTasPm7Lm/fD5I XnGDvM00JzC4qLqT+GRJUaPE0nkDETdX9pheUyYEZgbxgWIPPjhE0AYfGCu2A5YX0pgN 790g== X-Forwarded-Encrypted: i=1; AJvYcCXUao9wtuD5ICjVI+5OELJ4ka3cUOgceWiypr0H0wFFANiQyqtg/5MOwzkU/X49mAp8Wct57p+IJwdQsobNKpaX6idjMT313boPoNdCNA7mjn+mzZA= X-Gm-Message-State: AOJu0YwMZnsTEcxweh5PimiqKqPuYV1+khI0XzyooypUNalyZX+9+3U/ BKim7Vm0XPPa6LxUt2DLoFHmaXE7czak44FWWp/NhXP1ZK7qRhpwTWuyxIpaKw== X-Google-Smtp-Source: AGHT+IELON/D6RooatSQeRoEVcou7miObMbewbBYE7MHPBkdN609dnMCSDq80ofsBQHLp2uX0/EALw== X-Received: by 2002:a05:6a20:2a20:b0:1bd:2ad8:a221 with SMTP id adf61e73a8af0-1bd2ad8a24bmr898406637.46.1719356880776; Tue, 25 Jun 2024 16:08:00 -0700 (PDT) Received: from dianders.sjc.corp.google.com ([2620:15c:9d:2:245e:381b:154e:30bd]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f9eb3c6bb2sm86655315ad.142.2024.06.25.16.07.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jun 2024 16:08:00 -0700 (PDT) From: Douglas Anderson To: Catalin Marinas , Will Deacon Cc: Mark Rutland , Sumit Garg , Yu Zhao , Misono Tomohiro , Stephen Boyd , Chen-Yu Tsai , Marc Zyngier , Daniel Thompson , Douglas Anderson , D Scott Phillips , Frederic Weisbecker , "Guilherme G. Piccoli" , James Morse , Kees Cook , Tony Luck , linux-arm-kernel@lists.infradead.org, linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first Date: Tue, 25 Jun 2024 16:07:22 -0700 Message-ID: <20240625160718.v2.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid> X-Mailer: git-send-email 2.45.2.741.gdbec12cfda-goog MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240625_160802_619496_54C68A37 X-CRM114-Status: GOOD ( 42.86 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org When testing hard lockup handling on my sc7180-trogdor-lazor device with pseudo-NMI enabled, with serial console enabled and with kgdb disabled, I found that the stack crawls printed to the serial console ended up as a jumbled mess. After rebooting, the pstore-based console looked fine though. Also, enabling kgdb to trap the panic made the console look fine and avoided the mess. After a bit of tracking down, I came to the conclusion that this was what was happening: 1. The panic path was stopping all other CPUs with panic_other_cpus_shutdown(). 2. At least one of those other CPUs was in the middle of printing to the serial console and holding the console port's lock, which is grabbed with "irqsave". ...but since we were stopping with an NMI we didn't care about the "irqsave" and interrupted anyway. 3. Since we stopped the CPU while it was holding the lock it would never release it. 4. All future calls to output to the console would end up failing to get the lock in qcom_geni_serial_console_write(). This isn't _totally_ unexpected at panic time but it's a code path that's not well tested, hard to get right, and apparently doesn't work terribly well on the Qualcomm geni serial driver. The Qualcomm geni serial driver was fixed to be a bit better in commit 9e957a155005 ("serial: qcom-geni: Don't cancel/abort if we can't get the port lock") but it's nice not to get into this situation in the first place. Taking a page from what x86 appears to do in native_stop_other_cpus(), do this: 1. First, try to stop other CPUs with a normal IPI and wait a second. This gives them a chance to leave critical sections. 2. If CPUs fail to stop then retry with an NMI, but give a much lower timeout since there's no good reason for a CPU not to react quickly to a NMI. This works well and avoids the corrupted console and (presumably) could help avoid other similar issues. In order to do this, we need to do a little re-organization of our IPIs since we don't have any more free IDs. Do what was suggested in previous conversations and combine "stop" and "crash stop". That frees up an IPI so now we can have a "stop" and "stop NMI". In order to do this we also need a slight change in the way we keep track of which CPUs still need to be stopped. We need to know specifically which CPUs haven't stopped yet when we fall back to NMI but in the "crash stop" case the "cpu_online_mask" isn't updated as CPUs go down. This is why that code path had an atomic of the number of CPUs left. Solve this by also updating the "cpu_online_mask" for crash stops. All of the above lets us combine the logic for "stop" and "crash stop" code, which appeared to have a bunch of arbitrary implementation differences. Aside from the above change where we try a normal IPI and then an NMI, the combined function has a few subtle differences: * In the normal smp_send_stop(), if we fail to stop one or more CPUs then we won't include the current CPU (the one running smp_send_stop()) in the error message. * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print the CPUs that we failed to stop instead of printing all _but_ the current running CPU. * In crash_smp_send_stop(), we will now only print "SMP: stopping secondary CPUs" if (system_state <= SYSTEM_RUNNING). Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI") Signed-off-by: Douglas Anderson --- I'm not setup to test the crash_smp_send_stop(). I made sure it compiled and hacked the panic() method to call it, but I haven't actually run kexec. Hopefully others can confirm that it's working for them. v1: https://lore.kernel.org/r/20231207170251.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid Changes in v2: - Update commit message to point to Qualcomm serial driver fix. - Use a test-and-set to prevent stop code from running twice. - Move mask clearing until after crash_save_cpu(). - Use local_daif_mask() in ipi_cpu_crash_stop(). - Don't use a new mask, just have crash case update online CPUs. arch/arm64/kernel/smp.c | 138 ++++++++++++++++++++++------------------ 1 file changed, 75 insertions(+), 63 deletions(-) diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 31c8b3094dd7..254619245f97 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -71,7 +71,7 @@ enum ipi_msg_type { IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, - IPI_CPU_CRASH_STOP, + IPI_CPU_STOP_NMI, IPI_TIMER, IPI_IRQ_WORK, NR_IPI, @@ -88,6 +88,8 @@ static int ipi_irq_base __ro_after_init; static int nr_ipi __ro_after_init = NR_IPI; static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; +static bool crash_stop; + static void ipi_setup(int cpu); #ifdef CONFIG_HOTPLUG_CPU @@ -771,7 +773,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = { [IPI_RESCHEDULE] = "Rescheduling interrupts", [IPI_CALL_FUNC] = "Function call interrupts", [IPI_CPU_STOP] = "CPU stop interrupts", - [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", + [IPI_CPU_STOP_NMI] = "CPU stop NMIs", [IPI_TIMER] = "Timer broadcast interrupts", [IPI_IRQ_WORK] = "IRQ work interrupts", }; @@ -813,9 +815,9 @@ void arch_irq_work_raise(void) } #endif -static void __noreturn local_cpu_stop(void) +static void __noreturn local_cpu_stop(unsigned int cpu) { - set_cpu_online(smp_processor_id(), false); + set_cpu_online(cpu, false); local_daif_mask(); sdei_mask_local_cpu(); @@ -829,21 +831,26 @@ static void __noreturn local_cpu_stop(void) */ void __noreturn panic_smp_self_stop(void) { - local_cpu_stop(); + local_cpu_stop(smp_processor_id()); } -#ifdef CONFIG_KEXEC_CORE -static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); -#endif - static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs) { #ifdef CONFIG_KEXEC_CORE + /* + * Use local_daif_mask() instead of local_irq_disable() to make sure + * that pseudo-NMIs are disabled. The "crash stop" code starts with + * an IRQ and falls back to NMI (which might be pseudo). If the IRQ + * finally goes through right as we're timing out then the NMI could + * interrupt us. It's better to prevent the NMI and let the IRQ + * finish since the pt_regs will be better. + */ + local_daif_mask(); + crash_save_cpu(regs, cpu); - atomic_dec(&waiting_for_crash_ipi); + set_cpu_online(cpu, false); - local_irq_disable(); sdei_mask_local_cpu(); if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) @@ -908,14 +915,12 @@ static void do_handle_IPI(int ipinr) break; case IPI_CPU_STOP: - local_cpu_stop(); - break; - - case IPI_CPU_CRASH_STOP: - if (IS_ENABLED(CONFIG_KEXEC_CORE)) { + case IPI_CPU_STOP_NMI: + if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_stop) { ipi_cpu_crash_stop(cpu, get_irq_regs()); - unreachable(); + } else { + local_cpu_stop(cpu); } break; @@ -970,8 +975,7 @@ static bool ipi_should_be_nmi(enum ipi_msg_type ipi) return false; switch (ipi) { - case IPI_CPU_STOP: - case IPI_CPU_CRASH_STOP: + case IPI_CPU_STOP_NMI: case IPI_CPU_BACKTRACE: case IPI_KGDB_ROUNDUP: return true; @@ -1084,79 +1088,87 @@ static inline unsigned int num_other_online_cpus(void) void smp_send_stop(void) { + static unsigned long stop_in_progress; + cpumask_t mask; unsigned long timeout; - if (num_other_online_cpus()) { - cpumask_t mask; + /* + * If this cpu is the only one alive at this point in time, online or + * not, there are no stop messages to be sent around, so just back out. + */ + if (num_other_online_cpus() == 0) + goto skip_ipi; - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); + /* Only proceed if this is the first CPU to reach this code */ + if (test_and_set_bit(0, &stop_in_progress)) + return; - if (system_state <= SYSTEM_RUNNING) - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_STOP); - } + cpumask_copy(&mask, cpu_online_mask); + cpumask_clear_cpu(smp_processor_id(), &mask); - /* Wait up to one second for other CPUs to stop */ + if (system_state <= SYSTEM_RUNNING) + pr_crit("SMP: stopping secondary CPUs\n"); + + /* + * Start with a normal IPI and wait up to one second for other CPUs to + * stop. We do this first because it gives other processors a chance + * to exit critical sections / drop locks and makes the rest of the + * stop process (especially console flush) more robust. + */ + smp_cross_call(&mask, IPI_CPU_STOP); timeout = USEC_PER_SEC; while (num_other_online_cpus() && timeout--) udelay(1); - if (num_other_online_cpus()) + /* + * If CPUs are still online, try an NMI. There's no excuse for this to + * be slow, so we only give them an extra 10 ms to respond. + */ + if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) { + cpumask_and(&mask, &mask, cpu_online_mask); + + pr_info("SMP: retry stop with NMI for CPUs %*pbl\n", + cpumask_pr_args(&mask)); + + smp_cross_call(&mask, IPI_CPU_STOP_NMI); + timeout = USEC_PER_MSEC * 10; + while (num_other_online_cpus() && timeout--) + udelay(1); + } + + if (num_other_online_cpus()) { + cpumask_and(&mask, &mask, cpu_online_mask); + pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(cpu_online_mask)); + cpumask_pr_args(&mask)); + } +skip_ipi: sdei_mask_local_cpu(); } #ifdef CONFIG_KEXEC_CORE void crash_smp_send_stop(void) { - static int cpus_stopped; - cpumask_t mask; - unsigned long timeout; - /* * This function can be called twice in panic path, but obviously * we execute this only once. + * + * We use this same boolean to tell whether the IPI we send was a + * stop or a "crash stop". */ - if (cpus_stopped) + if (crash_stop) return; + crash_stop = 1; - cpus_stopped = 1; + smp_send_stop(); - /* - * If this cpu is the only one alive at this point in time, online or - * not, there are no stop messages to be sent around, so just back out. - */ - if (num_other_online_cpus() == 0) - goto skip_ipi; - - cpumask_copy(&mask, cpu_online_mask); - cpumask_clear_cpu(smp_processor_id(), &mask); - - atomic_set(&waiting_for_crash_ipi, num_other_online_cpus()); - - pr_crit("SMP: stopping secondary CPUs\n"); - smp_cross_call(&mask, IPI_CPU_CRASH_STOP); - - /* Wait up to one second for other CPUs to stop */ - timeout = USEC_PER_SEC; - while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--) - udelay(1); - - if (atomic_read(&waiting_for_crash_ipi) > 0) - pr_warn("SMP: failed to stop secondary CPUs %*pbl\n", - cpumask_pr_args(&mask)); - -skip_ipi: - sdei_mask_local_cpu(); sdei_handler_abort(); } bool smp_crash_stop_failed(void) { - return (atomic_read(&waiting_for_crash_ipi) > 0); + return num_other_online_cpus() != 0; } #endif