From patchwork Thu May 4 22:13:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Douglas Anderson X-Patchwork-Id: 13231893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CC239C77B7C for ; Thu, 4 May 2023 22:41:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=UoWEVxtNkfaPvb5B8GoEx3ntTTAR9wz/dv34b+gtnN4=; b=ZMzojSFTGym5bI Am441mvFnc5uPfUKh4+WrEjx/Zho5QDsrNvgPDovziau3SpC13ezO0oe7l4hSXacshDH+oMEHtQxI bjtCpPDQcFYIQglhqtmRAJQJavpFljbNzeDMB+lssVzWypvwEmuNeCBPyYa96TWDK9KFKYRQ4CvCv 4hUewqQwZf597astEIr69SXx0KJAp+MpPkUZiXJJlNtinQg6tHw+qcdL4X8WfN3IoPzqvkAgXGdZE iAq88iEpnnVQfmGWymXAc7oFdLHAZnvlXYwFOkhh3XtvIhiNmLdMe9p1UxAvOX9K4irHstoLShNpW ifZ34/+nm7CGmBTq3iRA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1puhcq-009EPk-0r; Thu, 04 May 2023 22:40:36 +0000 Received: from desiato.infradead.org ([2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1puhco-009EOk-1L for linux-arm-kernel@bombadil.infradead.org; Thu, 04 May 2023 22:40:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Transfer-Encoding:MIME-Version :References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To: Content-Type:Content-ID:Content-Description; bh=Qs9dVqDn19dbFuBLpnSq6u/1X9XncQ7793Pg3nTbXSY=; b=gju8lxbQPc4Ai7VN30r4BbT+7C tRt2ZAcpjHALvGoq8GCZaGF5GFAZB/uJrsXqCigSkYUOMrdXY38D2sWm1KH85wVy2cmL0XIgp/EK9 d3OuSvLevWVvK02rphHXr9YBKWQ/KyxWYS5adXLCLQXurfeTBalPrhvaeXWkB3LKAxKVZKosCUBY3 pk9yqWkxNoaLQCdS2UtwkVMI5Z8aweH9H1KvJ7GXIHbB4/izEnr87r8pzBryPejHsf9veiyKmZ1Wc bZ5Z9d8mGogMtmJv4Cu68GHIKPGqvqX32vaiYl4kAK/bTbqr0DVWGDlqNPP90iNTqVSy+oDsp/Jzc hfuk3mPg==; Received: from mail-pf1-x42d.google.com ([2607:f8b0:4864:20::42d]) by desiato.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1puhEI-001srn-0q for linux-arm-kernel@lists.infradead.org; Thu, 04 May 2023 22:15:17 +0000 Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-6434e40394eso862484b3a.1 for ; Thu, 04 May 2023 15:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683238512; x=1685830512; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qs9dVqDn19dbFuBLpnSq6u/1X9XncQ7793Pg3nTbXSY=; b=YJROINNmnTriLPHEMGRffEEU5u5jgOZI+OjH5ib5+xogbKCty4xGU/gxTjvwZlF0Wq Y41ByV3ME3y2zIqQvZlfm5TvuytZulDOsfyDEiZk5uPDq5w5r51Z0+JrQfSi+OMn98zN vTuEboyeGfy9khGK+oRTXwvq0syqveXf6UD30= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683238512; x=1685830512; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qs9dVqDn19dbFuBLpnSq6u/1X9XncQ7793Pg3nTbXSY=; b=N3Ez9GVWeBpPCRmzBSX8eL5DBg/mMHc/xm2JCAb4iUt/Bh7SQUHbj4ECliO7Lp2wMt 3S/e8ZtkJAoncJNpfV6yO+16mmCAEQkAqiiIuyQXsiVUgRO4UQI+76ObpREFTHpapwTx aCJbQM+SKWZt54o5aJUKBRjeQw0fbGB6qTgkS2i8UIhCdmnBdew4B0gZOinZm6jiKSh9 eFUYpw/bi0n5xmRFAtl6Z1nlfh/CBkZeolW6RpkTIcn0XAOyUAMIb0jtT9vau4nnj2GQ WF+OGf0Q21Nmmf1IXceoOXlDxT4PoA5L7FedLagPv2s4rV7bgDg9IO2Xh0R34bckUQfM 7mGA== X-Gm-Message-State: AC+VfDxm/l3b8gMsGdk3wSztRuABPo5JlwQ1UpNOn/aQOPXvYSQ9A2iE s7mcXhQnPWmfLm0vOwnG/jUVhg== X-Google-Smtp-Source: ACHHUZ4Ssrt4fd2DGpbw/dz7xZl+zekrUKqrX4nrV4w3MwOPjKkeowMi5feC+Af+mzM8p4SVwo3IHQ== X-Received: by 2002:a05:6a00:2284:b0:643:9b40:103e with SMTP id f4-20020a056a00228400b006439b40103emr1621714pfe.30.1683238512076; Thu, 04 May 2023 15:15:12 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:9d:2:edf0:7321:6b9e:d5e7]) by smtp.gmail.com with ESMTPSA id g26-20020aa7819a000000b006437c0edf9csm169615pfi.16.2023.05.04.15.15.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 15:15:11 -0700 (PDT) From: Douglas Anderson To: Petr Mladek , Andrew Morton Cc: Sumit Garg , Mark Rutland , Matthias Kaehlcke , Stephane Eranian , Stephen Boyd , ricardo.neri@intel.com, Tzung-Bi Shih , Lecopzer Chen , kgdb-bugreport@lists.sourceforge.net, Masayoshi Mizuma , Guenter Roeck , Pingfan Liu , Andi Kleen , Ian Rogers , linux-arm-kernel@lists.infradead.org, linux-perf-users@vger.kernel.org, ito-yuichi@fujitsu.com, Randy Dunlap , Chen-Yu Tsai , christophe.leroy@csgroup.eu, davem@davemloft.net, sparclinux@vger.kernel.org, mpe@ellerman.id.au, Will Deacon , ravi.v.shankar@intel.com, npiggin@gmail.com, linuxppc-dev@lists.ozlabs.org, Marc Zyngier , Catalin Marinas , Daniel Thompson , Douglas Anderson Subject: [PATCH v4 09/17] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check() Date: Thu, 4 May 2023 15:13:41 -0700 Message-ID: <20230504151100.v4.9.I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0@changeid> X-Mailer: git-send-email 2.40.1.521.gf1e218fcd8-goog In-Reply-To: <20230504221349.1535669-1-dianders@chromium.org> References: <20230504221349.1535669-1-dianders@chromium.org> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230504_231514_777579_FA1403FC X-CRM114-Status: GOOD ( 22.68 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In preparation for the buddy hardlockup detector where the CPU checking for lockup might not be the currently running CPU, add a "cpu" parameter to watchdog_hardlockup_check(). Signed-off-by: Douglas Anderson --- Changes in v4: - ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4. include/linux/nmi.h | 2 +- kernel/watchdog.c | 47 ++++++++++++++++++++++++++++-------------- kernel/watchdog_perf.c | 2 +- 3 files changed, 33 insertions(+), 18 deletions(-) diff --git a/include/linux/nmi.h b/include/linux/nmi.h index c6cb9bc5dc80..2c9ea1ba285c 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -88,7 +88,7 @@ static inline void hardlockup_detector_disable(void) {} #endif #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) -void watchdog_hardlockup_check(struct pt_regs *regs); +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs); #endif #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index f46669c1671d..367bea0167a5 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -92,14 +92,14 @@ static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed); static unsigned long watchdog_hardlockup_dumped_stacks; -static bool watchdog_hardlockup_is_lockedup(void) +static bool watchdog_hardlockup_is_lockedup(unsigned int cpu) { - unsigned long hrint = __this_cpu_read(hrtimer_interrupts); + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu); - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint) + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) return true; - __this_cpu_write(hrtimer_interrupts_saved, hrint); + per_cpu(hrtimer_interrupts_saved, cpu) = hrint; return false; } @@ -109,7 +109,7 @@ static void watchdog_hardlockup_interrupt_count(void) __this_cpu_inc(hrtimer_interrupts); } -void watchdog_hardlockup_check(struct pt_regs *regs) +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) { /* * Check for a hardlockup by making sure the CPU's timer @@ -117,35 +117,50 @@ void watchdog_hardlockup_check(struct pt_regs *regs) * fired multiple times before we overflow'd. If it hasn't * then this is a good indication the cpu is stuck */ - if (watchdog_hardlockup_is_lockedup()) { + if (watchdog_hardlockup_is_lockedup(cpu)) { unsigned int this_cpu = smp_processor_id(); + struct cpumask backtrace_mask = *cpu_online_mask; /* Only handle hardlockups once. */ - if (__this_cpu_read(watchdog_hardlockup_processed)) + if (per_cpu(watchdog_hardlockup_processed, cpu)) return; - pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu); + pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu); print_modules(); print_irqtrace_events(current); - if (regs) + if (regs) { show_regs(regs); - else - dump_stack(); + cpumask_clear_cpu(cpu, &backtrace_mask); + } else { + /* + * If the locked up CPU is different than the CPU we're + * running on then we'll try to backtrace the CPU that + * locked up and then exclude it from later backtraces. + * If that fails or if we're running on the locked up + * CPU, just do a normal backtrace. + */ + if (cpu != this_cpu && trigger_single_cpu_backtrace(cpu)) { + cpumask_clear_cpu(cpu, &backtrace_mask); + } else { + dump_stack(); + cpumask_clear_cpu(this_cpu, &backtrace_mask); + } + } /* - * Perform all-CPU dump only once to avoid multiple hardlockups - * generating interleaving traces + * Perform multi-CPU dump only once to avoid multiple + * hardlockups generating interleaving traces */ if (sysctl_hardlockup_all_cpu_backtrace && !test_and_set_bit(0, &watchdog_hardlockup_dumped_stacks)) - trigger_allbutself_cpu_backtrace(); + trigger_cpumask_backtrace(&backtrace_mask); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP"); - __this_cpu_write(watchdog_hardlockup_processed, true); + per_cpu(watchdog_hardlockup_processed, cpu) = true; } else { - __this_cpu_write(watchdog_hardlockup_processed, false); + per_cpu(watchdog_hardlockup_processed, cpu) = false; } } diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c index 5f3651b87ee7..9be90b2a2ea7 100644 --- a/kernel/watchdog_perf.c +++ b/kernel/watchdog_perf.c @@ -120,7 +120,7 @@ static void watchdog_overflow_callback(struct perf_event *event, if (!watchdog_check_timestamp()) return; - watchdog_hardlockup_check(regs); + watchdog_hardlockup_check(smp_processor_id(), regs); } static int hardlockup_detector_event_create(void)