From patchwork Tue Sep 24 09:45:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "liwei (GF)" X-Patchwork-Id: 13810616 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6511B81AB6 for ; Tue, 24 Sep 2024 09:53:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171626; cv=none; b=OKm0ppRxHlSIXCWwcMmbnZ+gjVPxQJZmsNiHpgpy/A+Hr3fU6yR0CJpxjze53fz1fo1kqnXNCQqkiOlLaFBh1L1X4HQJj2JgDvU58KXpxkuMif+r+THwkFm04mfrzujaIKUyWxsjG5hifufpZbaKfy2/IoTnTbXkG04uhHKVe6s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171626; c=relaxed/simple; bh=Lc2e0zRgXM5T4imJ4gDTFpO2qeQWnRCIRERqVJbVRBQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Og1+wriEgu3RJ7QPR7CQ2ag0h+ue7znu6djWmysrPDx0co22ZDXH8ATJQagib48nlVjr7htVEhKgSdKIAoe3UM9D6cWCuqRSTGGIXDfy0V5XuLb1lYoNsDmPtrqS4MIpcf24XKyJ7wvpkBCp0KMpUbH316xdKyhxZ74f27oQR2I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4XCZt03Ssmz2QTxJ; Tue, 24 Sep 2024 17:52:56 +0800 (CST) Received: from kwepemd100024.china.huawei.com (unknown [7.221.188.41]) by mail.maildlp.com (Postfix) with ESMTPS id 4EBAF140119; Tue, 24 Sep 2024 17:53:42 +0800 (CST) Received: from ubuntu-20-04.huawei.com (10.175.103.91) by kwepemd100024.china.huawei.com (7.221.188.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 24 Sep 2024 17:53:41 +0800 From: Wei Li To: Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Daniel Bristot de Oliveira CC: , Subject: [PATCH 1/5] tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline Date: Tue, 24 Sep 2024 17:45:11 +0800 Message-ID: <20240924094515.3561410-2-liwei391@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240924094515.3561410-1-liwei391@huawei.com> References: <20240924094515.3561410-1-liwei391@huawei.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100024.china.huawei.com (7.221.188.41) osnoise_hotplug_workfn() is the asynchronous online callback for "trace/osnoise:online". It may be congested when a CPU goes online and offline repeatedly and is invoked for multiple times after a certain online. This will lead to kthread leak and timer corruption. Add a check in start_kthread() to prevent this situation. Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Wei Li --- kernel/trace/trace_osnoise.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index 7e75c1214b36..934a14bc72e6 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2007,6 +2007,10 @@ static int start_kthread(unsigned int cpu) void *main = osnoise_main; char comm[24]; + /* Do not start a new thread if it is already running */ + if (per_cpu(per_cpu_osnoise_var, cpu).kthread) + return 0; + if (timerlat_enabled()) { snprintf(comm, 24, "timerlat/%d", cpu); main = timerlat_main; @@ -2061,11 +2065,10 @@ static int start_per_cpu_kthreads(void) if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask)) { struct task_struct *kthread; - kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread; + kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL); if (!WARN_ON(!kthread)) kthread_stop(kthread); } - per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; } for_each_cpu(cpu, current_mask) { From patchwork Tue Sep 24 09:45:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "liwei (GF)" X-Patchwork-Id: 13810619 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB5B6482D8 for ; Tue, 24 Sep 2024 09:53:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171632; cv=none; b=AIt8OghAPWjTruQrVSG0i5Pfg1NyZiPUHaYE8kOg+mB0FZrY7MNVZ3chbMMYOKaUNZk8CUrvf5/agRVB4/9XhKtsscSJqnIAZBix8Lt8xnKrs45n3GT0TC/3mDuavHr1I4scIx0k/Tl5zL2OGgZ122Mb0x0thbU+/umQQBT/7m8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171632; c=relaxed/simple; bh=0Ui2ljAR/w50xPuEf3VpL87GSUBeC8ecWbJvjJ8Gd4o=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WzXIPY3zG6chhTBOjMSyYs6X1f9nzQmOBgVxDpItbD2q06FCIym+yBfi3lDL7joQQJBNcR+YgTQKvcq3bBW/TSC7Hn5RSWG2cNipJvNUaGNZ8gYCf8I9Xj5MUaGbh8/JcOFNChPT2MGTWS2SrYNwnpqqp/UUJ5+y/kmo0aoCs30= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4XCZsl0Bc9zySBd; Tue, 24 Sep 2024 17:52:43 +0800 (CST) Received: from kwepemd100024.china.huawei.com (unknown [7.221.188.41]) by mail.maildlp.com (Postfix) with ESMTPS id EB09D140393; Tue, 24 Sep 2024 17:53:42 +0800 (CST) Received: from ubuntu-20-04.huawei.com (10.175.103.91) by kwepemd100024.china.huawei.com (7.221.188.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 24 Sep 2024 17:53:42 +0800 From: Wei Li To: Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Daniel Bristot de Oliveira CC: , Subject: [PATCH 2/5] tracing/timerlat: Drop interface_lock in stop_kthread() Date: Tue, 24 Sep 2024 17:45:12 +0800 Message-ID: <20240924094515.3561410-3-liwei391@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240924094515.3561410-1-liwei391@huawei.com> References: <20240924094515.3561410-1-liwei391@huawei.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100024.china.huawei.com (7.221.188.41) stop_kthread() is the offline callback for "trace/osnoise:online", since commit 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()"), the following ABBA deadlock scenario is introduced: T1 | T2 [BP] | T3 [AP] osnoise_hotplug_workfn() | work_for_cpu_fn() | cpuhp_thread_fun() | _cpu_down() | osnoise_cpu_die() mutex_lock(&interface_lock) | | stop_kthread() | cpus_write_lock() | mutex_lock(&interface_lock) cpus_read_lock() | cpuhp_kick_ap() | As the interface_lock here in just for protecting the "kthread" field of the osn_var, use xchg() instead to fix this issue. Also use for_each_online_cpu() back in stop_per_cpu_kthreads() as it can take cpu_read_lock() again. Fixes: 5bfbcd1ee57b ("tracing/timerlat: Add interface_lock around clearing of kthread in stop_kthread()") Signed-off-by: Wei Li --- kernel/trace/trace_osnoise.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index 934a14bc72e6..ddc9afb9b7d4 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -1953,12 +1953,8 @@ static void stop_kthread(unsigned int cpu) { struct task_struct *kthread; - mutex_lock(&interface_lock); - kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread; + kthread = xchg_relaxed(&(per_cpu(per_cpu_osnoise_var, cpu).kthread), NULL); if (kthread) { - per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; - mutex_unlock(&interface_lock); - if (cpumask_test_and_clear_cpu(cpu, &kthread_cpumask) && !WARN_ON(!test_bit(OSN_WORKLOAD, &osnoise_options))) { kthread_stop(kthread); @@ -1972,7 +1968,6 @@ static void stop_kthread(unsigned int cpu) put_task_struct(kthread); } } else { - mutex_unlock(&interface_lock); /* if no workload, just return */ if (!test_bit(OSN_WORKLOAD, &osnoise_options)) { /* @@ -1994,8 +1989,12 @@ static void stop_per_cpu_kthreads(void) { int cpu; - for_each_possible_cpu(cpu) + cpus_read_lock(); + + for_each_online_cpu(cpu) stop_kthread(cpu); + + cpus_read_unlock(); } /* From patchwork Tue Sep 24 09:45:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "liwei (GF)" X-Patchwork-Id: 13810620 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A39315884D for ; Tue, 24 Sep 2024 09:53:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.190 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171634; cv=none; b=EcZ4ovIuXZb+hTQkZoxPiVZ/hjjMoTvL/hAF3kF4LOJIhOtHOItVNEJHoRRTiWNEW4XRVhZVsY2NK5pLU0xfdNfnP186umRkmmlDxIbNL/fkJOxgYAwOehmIJum1c6arqY+4d0fvSFJZcxA4GEsrUwcc+X6jALwxJEzaNCZlvfs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171634; c=relaxed/simple; bh=EC8X3+UaNzudnYqucqUr2vw07O9Tg8gwI4s0TE74nUg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a6yI+QiQ8W93yPpFw86cON+ix7DRCWL5WMWIq8x4bU+tf2Sjr2Ai8gVyf5KxtfNgeNE0CmocpNd5TdJHpVeZB1dFRLfSDCh4/d07cBnVrN8wcn9xBpXoCikEMa3ZhJxMPFuhY8uHJJ0zwgMkHayLknfHKCUS/XYsY5qpjPhNH2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.190 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4XCZtW358Bz20pNW; Tue, 24 Sep 2024 17:53:23 +0800 (CST) Received: from kwepemd100024.china.huawei.com (unknown [7.221.188.41]) by mail.maildlp.com (Postfix) with ESMTPS id 93CDC140119; Tue, 24 Sep 2024 17:53:43 +0800 (CST) Received: from ubuntu-20-04.huawei.com (10.175.103.91) by kwepemd100024.china.huawei.com (7.221.188.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 24 Sep 2024 17:53:42 +0800 From: Wei Li To: Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Daniel Bristot de Oliveira CC: , Subject: [PATCH 3/5] tracing/timerlat: Fix a race during cpuhp processing Date: Tue, 24 Sep 2024 17:45:13 +0800 Message-ID: <20240924094515.3561410-4-liwei391@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240924094515.3561410-1-liwei391@huawei.com> References: <20240924094515.3561410-1-liwei391@huawei.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100024.china.huawei.com (7.221.188.41) There is another found exception that the "timerlat/1" thread was scheduled on CPU0, and lead to timer corruption finally: ``` ODEBUG: init active (active state 0) object: ffff888237c2e108 object type: hrtimer hint: timerlat_irq+0x0/0x220 WARNING: CPU: 0 PID: 426 at lib/debugobjects.c:518 debug_print_object+0x7d/0xb0 Modules linked in: CPU: 0 UID: 0 PID: 426 Comm: timerlat/1 Not tainted 6.11.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:debug_print_object+0x7d/0xb0 ... Call Trace: ? __warn+0x7c/0x110 ? debug_print_object+0x7d/0xb0 ? report_bug+0xf1/0x1d0 ? prb_read_valid+0x17/0x20 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? debug_print_object+0x7d/0xb0 ? debug_print_object+0x7d/0xb0 ? __pfx_timerlat_irq+0x10/0x10 __debug_object_init+0x110/0x150 hrtimer_init+0x1d/0x60 timerlat_main+0xab/0x2d0 ? __pfx_timerlat_main+0x10/0x10 kthread+0xb7/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 ``` After tracing the scheduling event, it was discovered that the migration of the "timerlat/1" thread was performed during thread creation. Further analysis confirmed that it is because the CPU online processing for osnoise is implemented through workers, which is asynchronous with the offline processing. When the worker was scheduled to create a thread, the CPU may has already been removed from the cpu_online_mask during the offline process, resulting in the inability to select the right CPU: T1 | T2 [CPUHP_ONLINE] | cpu_device_down() osnoise_hotplug_workfn() | | cpus_write_lock() | takedown_cpu(1) | cpus_write_unlock() [CPUHP_OFFLINE] | cpus_read_lock() | start_kthread(1) | cpus_read_unlock() | To fix this, skip online processing if the CPU is already offline. Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Wei Li --- kernel/trace/trace_osnoise.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index ddc9afb9b7d4..6ed4008e6d62 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -2097,6 +2097,8 @@ static void osnoise_hotplug_workfn(struct work_struct *dummy) mutex_lock(&interface_lock); cpus_read_lock(); + if (!cpu_online(cpu)) + goto out_unlock; if (!cpumask_test_cpu(cpu, &osnoise_cpumask)) goto out_unlock; From patchwork Tue Sep 24 09:45:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "liwei (GF)" X-Patchwork-Id: 13810618 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23E4B154BE0 for ; Tue, 24 Sep 2024 09:53:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.191 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171628; cv=none; b=bGztoIs3Z03ZuH3PriwGAJgZqNY24j71eyN8y7y+hIy1qLGeDp84Z/cYRnydaepUZ3X6zGjTP+gC+W5ghjjfWt/otsaa9cx81WjFtt4CA4e21R1zm1qcqOH7vLIZOLPVAz92eWfizRQbNULFXiY+aplWGK3NTMI4iccsSc403Uo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171628; c=relaxed/simple; bh=uIZTQqwRLlVoS24DHh587irsQCRGnKJbXqjbYpbUsuU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RF8L/0CoLDSrnb8ZpqYEPkJf0psVz+1mNbiI3QjfqcGwN7E5LKtiDR1Q3tuEoRuRZy/wKGe3mCiA91rTneW+0xRhhhWYDDz3vJNid/F4DocYmIWhJlbZqZ+GvtIymoQErZL26Q0Rti2PkuPcRzAGhh0k24kaUd1kfKRpVWqNPu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.191 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4XCZt23F66z2QTxR; Tue, 24 Sep 2024 17:52:58 +0800 (CST) Received: from kwepemd100024.china.huawei.com (unknown [7.221.188.41]) by mail.maildlp.com (Postfix) with ESMTPS id 47995140119; Tue, 24 Sep 2024 17:53:44 +0800 (CST) Received: from ubuntu-20-04.huawei.com (10.175.103.91) by kwepemd100024.china.huawei.com (7.221.188.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 24 Sep 2024 17:53:43 +0800 From: Wei Li To: Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Daniel Bristot de Oliveira CC: , Subject: [PATCH 4/5] tracing/hwlat: Fix a race during cpuhp processing Date: Tue, 24 Sep 2024 17:45:14 +0800 Message-ID: <20240924094515.3561410-5-liwei391@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240924094515.3561410-1-liwei391@huawei.com> References: <20240924094515.3561410-1-liwei391@huawei.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100024.china.huawei.com (7.221.188.41) The cpuhp online/offline processing race also exists in percpu-mode hwlat tracer in theory, apply the fix too. Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") Signed-off-by: Wei Li --- kernel/trace/trace_hwlat.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index b791524a6536..3bd6071441ad 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -520,6 +520,8 @@ static void hwlat_hotplug_workfn(struct work_struct *dummy) if (!hwlat_busy || hwlat_data.thread_mode != MODE_PER_CPU) goto out_unlock; + if (!cpu_online(cpu)) + goto out_unlock; if (!cpumask_test_cpu(cpu, tr->tracing_cpumask)) goto out_unlock; From patchwork Tue Sep 24 09:45:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "liwei (GF)" X-Patchwork-Id: 13810621 X-Patchwork-Delegate: rostedt@goodmis.org Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1181482D8 for ; Tue, 24 Sep 2024 09:53:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171634; cv=none; b=U8B6rcjuR2q6MvBHtaJuZwttzeW2bUS7pqETpTD+NOLbx/XYVKldS9HbxSrLSr6AQWhkNmXAg2SMEZBmy4fRBD1oon0V+bFGGbHPb7JXZaM19zY6+TUUi3IyxfTeKpRqk0RXoOfU3vzUH+pHR9/yqioeEVbwg8UERxaYV9wKXrw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727171634; c=relaxed/simple; bh=pHR0UkvKEGYhOkAyV8FTZLacUpWReaRNRwQOAV91g5k=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=lcZf/4A8fNo7kDONPZ3AIEOKdjyA5iql79Uqm5wKMMCSUV1EdXHbnfRvzEJrzHZNKyp5yuv4gyD4upXVRnz8VXoE687hdZgxuYYL7q4gYshEwsrO4wVKPEEygEP3T+MUpunOQWQmz/E0o4VL5k2X2kvhRFWpUUhIAqmuQk/VLSM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.48]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4XCZtX3l51zQrQZ; Tue, 24 Sep 2024 17:53:24 +0800 (CST) Received: from kwepemd100024.china.huawei.com (unknown [7.221.188.41]) by mail.maildlp.com (Postfix) with ESMTPS id DF6AF180087; Tue, 24 Sep 2024 17:53:44 +0800 (CST) Received: from ubuntu-20-04.huawei.com (10.175.103.91) by kwepemd100024.china.huawei.com (7.221.188.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 24 Sep 2024 17:53:44 +0800 From: Wei Li To: Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Daniel Bristot de Oliveira CC: , Subject: [PATCH 5/5] tracing/hwlat: Fix deadlock in cpuhp processing Date: Tue, 24 Sep 2024 17:45:15 +0800 Message-ID: <20240924094515.3561410-6-liwei391@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240924094515.3561410-1-liwei391@huawei.com> References: <20240924094515.3561410-1-liwei391@huawei.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemd100024.china.huawei.com (7.221.188.41) Another "hung task" error was reported during the test, and i figured out the deadlock scenario is as follows: T1 [BP] | T2 [AP] | T3 [hwlatd/1] | T4 work_for_cpu_fn() | cpuhp_thread_fun() | kthread_fn() | hwlat_hotplug_workfn() _cpu_down() | stop_cpu_kthread() | | mutex_lock(&hwlat_data.lock) cpus_write_lock() | kthread_stop(hwlatd/1) | mutex_lock(&hwlat_data.lock) | __cpuhp_kick_ap() | wait_for_completion() | | cpus_read_lock() It constitutes ABBA deadlock indirectly between "cpu_hotplug_lock" and "hwlat_data.lock", make the mutex obtaining in kthread_fn() interruptible to fix this. Fixes: ba998f7d9531 ("trace/hwlat: Support hotplug operations") Signed-off-by: Wei Li --- kernel/trace/trace_hwlat.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 3bd6071441ad..4c228ccb8a38 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -370,7 +370,8 @@ static int kthread_fn(void *data) get_sample(); local_irq_enable(); - mutex_lock(&hwlat_data.lock); + if (mutex_lock_interruptible(&hwlat_data.lock)) + break; interval = hwlat_data.sample_window - hwlat_data.sample_width; mutex_unlock(&hwlat_data.lock);