From patchwork Wed Jul 12 15:15:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Leizhen (ThunderTown)" X-Patchwork-Id: 13310540 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A523EC001DC for ; Wed, 12 Jul 2023 15:16:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231882AbjGLPQX (ORCPT ); Wed, 12 Jul 2023 11:16:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232636AbjGLPQW (ORCPT ); Wed, 12 Jul 2023 11:16:22 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6FEB1BD7; Wed, 12 Jul 2023 08:16:20 -0700 (PDT) Received: from dggpemm500006.china.huawei.com (unknown [172.30.72.57]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4R1LsR4M9JzrRhJ; Wed, 12 Jul 2023 23:15:39 +0800 (CST) Received: from thunder-town.china.huawei.com (10.174.178.55) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 12 Jul 2023 23:16:14 +0800 From: Zhen Lei To: "Paul E . McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , , CC: Zhen Lei Subject: [PATCH v2 1/2] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Date: Wed, 12 Jul 2023 23:15:56 +0800 Message-ID: <20230712151557.760-2-thunder.leizhen@huawei.com> X-Mailer: git-send-email 2.37.3.windows.1 In-Reply-To: <20230712151557.760-1-thunder.leizhen@huawei.com> References: <20230712151557.760-1-thunder.leizhen@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.178.55] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The kernel provides the only legal way, that is, function set_task_cpu(), to dynamically change the value of task_cpu(p). And in which the validity of the CPU ID is checked. set_task_cpu(): WARN_ON_ONCE(!cpu_online(new_cpu)); __set_task_cpu(p, new_cpu); Therefore, in normal cases, the value of task_cpu(gpk) is always valid. Then for the following snippet in rcu_check_gp_kthread_starvation(): cpu = gpk ? task_cpu(gpk) : -1; if (gpk) { if (cpu >= 0) { The above condition "if (gpk)" already ensures that gp_kthread is created, so the local variable 'cpu' cannot be negative here. Signed-off-by: Zhen Lei --- kernel/rcu/tree_stall.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index b10b8349bb2a48b..dcfaa3d5db2cbc7 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void) pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); - if (cpu >= 0) { - if (cpu_is_offline(cpu)) { - pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { - pr_err("Stack dump where RCU GP kthread last ran:\n"); - dump_cpu_task(cpu); - } + if (cpu_is_offline(cpu)) { + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); + } else { + pr_err("Stack dump where RCU GP kthread last ran:\n"); + dump_cpu_task(cpu); } wake_up_process(gpk); } From patchwork Wed Jul 12 15:15:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Leizhen (ThunderTown)" X-Patchwork-Id: 13310538 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE8BAEB64D9 for ; Wed, 12 Jul 2023 15:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231218AbjGLPQU (ORCPT ); Wed, 12 Jul 2023 11:16:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230086AbjGLPQT (ORCPT ); Wed, 12 Jul 2023 11:16:19 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1E091BD1; Wed, 12 Jul 2023 08:16:17 -0700 (PDT) Received: from dggpemm500006.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4R1LqR4nv8zPk5W; Wed, 12 Jul 2023 23:13:55 +0800 (CST) Received: from thunder-town.china.huawei.com (10.174.178.55) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Wed, 12 Jul 2023 23:16:15 +0800 From: Zhen Lei To: "Paul E . McKenney" , Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , , CC: Zhen Lei Subject: [PATCH v2 2/2] rcu: Don't dump the stalled CPU on where RCU GP kthread last ran twice Date: Wed, 12 Jul 2023 23:15:57 +0800 Message-ID: <20230712151557.760-3-thunder.leizhen@huawei.com> X-Mailer: git-send-email 2.37.3.windows.1 In-Reply-To: <20230712151557.760-1-thunder.leizhen@huawei.com> References: <20230712151557.760-1-thunder.leizhen@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.178.55] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks(). If the CPU on where RCU GP kthread last ran is stalled, its stack does not need to be dumped again. We can search the corresponding backtrace based on the printed CPU ID. For example: [ 87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3 <--------| ... ... | [ 89.385007] NMI backtrace for cpu 3 <--------| [ 89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--| [ 89.385188] Hardware name: linux,dummy-virt (DT) [ 89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) [ 89.385204] pc : arch_cpu_idle+0x40/0xc0 [ 89.385211] lr : arch_cpu_idle+0x2c/0xc0 ... ... [ 89.385566] Call trace: [ 89.385574] arch_cpu_idle+0x40/0xc0 [ 89.385581] default_idle_call+0x100/0x450 [ 89.385589] cpuidle_idle_call+0x2f8/0x460 [ 89.385596] do_idle+0x1dc/0x3d0 [ 89.385604] cpu_startup_entry+0x5c/0xb0 [ 89.385613] secondary_start_kernel+0x35c/0x520 Signed-off-by: Zhen Lei Reviewed-by: Joel Fernandes (Google) --- kernel/rcu/tree_stall.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index dcfaa3d5db2cbc7..cc884cd49e026a3 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void) data_race(READ_ONCE(rcu_state.gp_state)), gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu); if (gpk) { + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); if (cpu_is_offline(cpu)) { pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); - } else { + } else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) { pr_err("Stack dump where RCU GP kthread last ran:\n"); dump_cpu_task(cpu); }