From patchwork Mon Dec 23 09:37:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 13918633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28F06E7718B for ; Mon, 23 Dec 2024 09:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FA576B007B; Mon, 23 Dec 2024 04:37:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AAF76B0082; Mon, 23 Dec 2024 04:37:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2267F6B0083; Mon, 23 Dec 2024 04:37:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 02FF46B007B for ; Mon, 23 Dec 2024 04:37:37 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6885B120C52 for ; Mon, 23 Dec 2024 09:37:37 +0000 (UTC) X-FDA: 82925720640.04.EA344F7 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) by imf25.hostedemail.com (Postfix) with ESMTP id 8CF4DA0009 for ; Mon, 23 Dec 2024 09:37:08 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H1NYxfrh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734946638; a=rsa-sha256; cv=none; b=X7JjGBPWotF6J30aiGcwj6vSev6by0+SR0ltyubO9O+/biZNUVlQIVADi38VBw/syWWIFL Cx/38jX2Uxy2bJmApODTwcyFZObkAW9ZYuEDlNTDkUIpz3Sx/piP8PjdLcN39OeJxaboF4 cXGqj/vVwrmzmllEVNRWWl6LpVcGy3Q= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H1NYxfrh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734946638; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=0ESfG79+PcZuH+MzC4lH8jD2aPNHGINgHLlNjUVjw7c=; b=WiIo55Dzi0mk79W6oUa9k0T/g07zAsqqUCZcAehYAmFjXK6kA6f4I9QkLYIZslYFOo8zKz JUN4pt+QTsBLlkj1vPY4mazo5KZOc2+OaArvoYgZ4vK+GLBkxEurSW8Bi+aRjL1If0PbtI 4Wi9Dn+sT2k0KYumtKc+JwL15W1+S5E= Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-725dac69699so3483130b3a.0 for ; Mon, 23 Dec 2024 01:37:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734946654; x=1735551454; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=0ESfG79+PcZuH+MzC4lH8jD2aPNHGINgHLlNjUVjw7c=; b=H1NYxfrhVNT1pp2Xuuq50sPhuV360s8ZVIFtvbqeRu5QoQtpko9nvXHiLLO4FZU3OY OtG70CPb3T7edlQVbcnO22P6mHT72HqhN0wpFOqsbgfNl5b4zyG93aKI1RbyVy1UhrgI HYd7HSPLx/NFcnLDDGj+lT+Ggc/0eLEn/ubkzlBGkGj7KW3tN2wJpfhTa1a0Be5LftZc vz5UlUQ6zy1bjVNrg/ilFwt9CNqYShxdNcD6EiaTM5BhGh8E0zoZPAtthLhYvibZrn7+ 9G+6InlZvmFeNje2ooSYEPX2LBBAsYalJO5wNWyXLVO/XX0ptOW6f0gYE5uKJA5V2UsU HTpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734946654; x=1735551454; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0ESfG79+PcZuH+MzC4lH8jD2aPNHGINgHLlNjUVjw7c=; b=iPuLpzHln7lrpCp2MVB020rmycqriUYk/x6EDaqV26kQBKG6Hyw/xNk7GvUYi0zxme fAnB2tXmKfMH/9G15RRsK3CXDcuD9kt7/9act3P7+B+4wpZmOxNEI8tkwMQDRYsDn+cV GKpqfdU/3LxRkVNmWSOJ833OOb5jMdKOGQMYLg4MLyh3I+XTwBBqrAV23oHLz8Ftnc9n zLrmhUuOFlQXEpe0whxL8acm4wsQkblW7emhMDgOeN33CcwipzUtpMrk9TvRurmdXpB3 RJflL0qtWjt9zTQNhZ0qDK1Jnu/QcrEGfh8YiAzF6IRk7YHtVnO55wdaptfgRnuD3HKb JE7Q== X-Gm-Message-State: AOJu0YztxkvaQae34vRH+FU3v8BxWHKiYqWTpq4BwRPi3ZUydeiy6SqQ 8/61RyTD5GIGbicL4fNSfOYZWjbgJw6jSsKYTAQZQTRHDZ4yTnxM X-Gm-Gg: ASbGncsSX4LBsFFuWuc+M0iByexdFTCXdKr62YZ3oPBxnjM5uSkK3T0peY4uao77w/6 59bLdRJ3BNvZfCNjgl+rkdmWZhS+vF6o4D+ktLjg09/fSyN+PcdXXWsw7g00iy0ZAc0GabJTzad VgoV2Ct83ucUmjZXywGI+q9R4s5vrXC/lUXk7cwhtK7gwU8UYecvbeKRO9UCfQKflLruqgCAHGb H9M/8JrOa/VOlAhX0zPV+lWQars8PFmD1vv3ZmAmZYktyPt8mt5+rzTqAcYq+CBOJ7AJSsHMoFy D8TeZ2U= X-Google-Smtp-Source: AGHT+IFNR9qlTjlg1dNX4yr0XncX12ZUapQS/RnzWqqWQRakIntoz3bYLo0a/Ae2eSipeQPvkIeiuw== X-Received: by 2002:a05:6a00:8089:b0:728:f337:a721 with SMTP id d2e1a72fcca58-72abdd7bc30mr14348300b3a.7.1734946654066; Mon, 23 Dec 2024 01:37:34 -0800 (PST) Received: from localhost.localdomain ([180.159.118.224]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad8315b8sm7466943b3a.56.2024.12.23.01.37.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 23 Dec 2024 01:37:33 -0800 (PST) From: Yafang Shao To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao , Dave Chinner Subject: [PATCH] hung_task: fix missing hung task detection for kthread in TASK_WAKEKILL state Date: Mon, 23 Dec 2024 17:37:22 +0800 Message-Id: <20241223093722.78570-1-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8CF4DA0009 X-Stat-Signature: 3o8w4ra3369sgth8fbct6zdw3j1f4ztd X-HE-Tag: 1734946628-326274 X-HE-Meta: U2FsdGVkX1/l7Ddkf4YFP+ti1N7ZK3TyCJ4x7tUr3tr24BDdMwLFclWDGdciE7aOY2UxGgYX3Pk2KVC1WsuWVyzlTLRTc37YwQuFVaD45m3zotHtdwiO7Q36G8epbwxcL5vBd70w4z7zNOpiZj/C6q6ZkED/2bJ8KfvcwEYi4j/cpO2c3G161/iMrZks0MvyztkuN7oSr/5AzTGAXDstTqwX4bRnVlN9U4BB2MU61DQWv4LFD+BIaEjFAT2OAUSsS45U7OT6ljVuqtumbnm+RsZ1U4YKgIBdGYPkbyqPjKBDkjGRu68Ra/HxPDO7hVviE0eMEAjDwOEgonqKgKT4lA3rsHFfikl5Q23HvpCIeg1gLLr88DUV/3SiUC7ge6f8O0g8yd7JH6+IwMq/lBSeMVTt2hEt1g2rF1gIfQyoVFa25kYXHbFtYSO4ZU4h/9fItiPed32NxfJ0hmcqxKjBnQzjgUjH2aXDG2HMvWcbw0zn+dinSr38dFec8Dog7EiwAsUe7OmJnG4by242h0axHFl4uRV7LUHgncdRBMiPTazXGSmawJu7mKwgvFQ5bREvmz+N0afg1P3z/gTyfcJt4tJpCZfm2yqac73R7Hqsb1rmUTkEEmW/bfz5CfYPB2bPOvO3bgMgqBPJVVUZgisDtC23a69BYfJFwPTLITlztNCtJbptdgPlatK5To4SFSEcL2wZhJCI9l0s0b459sKwaJwdo4mnf7x52OlW/He0s+7yVmS4oGqcgAe01NCkkWROi0MAVnwjB32HfgjU9k5McMVSODSfTY4Uo7AU/1WRkyKxRR8QyfWsWPj9TiorxTPZwyXcdiIda3wqFH4Yd5g8ZhVLP+IBcXAglMRn+ZMmCMdt3PsD/fZs0hHiKe3XMO4qNA02y2cMnVvQlL+3BRea+cMfgfoo+14sgrhy157ZSBKXq9ENWbuBWHrXzI4xEKeo9OCzraPQpAGE+vcK74x gnhv7lmT OCu7sVQRW9GgKvvw6KA8byer5mSMQRqQtIeKkQb0thUl/r0WMGdoGlfbAVz1cInzVMVAMl8ODcF3T5PzE84SN0oHUQPTuDFIXvGK/PBqGAroqBATXLdsbb6aa4AhicOzGwObq8vqPTkMl6sN7SFOrlxhJkUArWhMKZ9qBokYde007WJ+FVtjZGATO+ocJdYkWqCsDgHib69Xr7o9Xv9kqZlY2ZHKb3g1X9WxDSnVIu+AmmtUbps+YoN6MAcEDThLkj5ZwfNdn3aDoaoCIM2J0dYfq9Ll7yvCpsAkCm/C2PPsXcsjmsWhTQhMj3ZfNvSC9Kb2klhL3Fib1Z4IwR93lhYGvm1TH5Aw1DTmXBfT0fL23i+IoU+9OiLzH2PxniFoF0z6OIjShgLVbx4IXrfydGmjLPJPp4LotD1W+hCLz0ByERIv5Dm9sabuQ1Ov4cZamzl0SGPGBDbStzykwMZNT/hH7Hc8Vg+Or4mKCIxbmjuegRrZ6fgvX/g8/JGUM5ysa+1NhdYUu6yN3NqEeLX5nbFOUBLc0sYI2W+ih8Fm45ou85Ar/AAHAuiXA8hrokq+NNiNH3kUNqwv5N/1+lOzP1Liv9LNZZUOyiBzy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000548, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We recently encountered an XFS deadlock issue, which is a known problem resolved in the upstream kernel [0]. During the analysis of this issue, I observed that a kernel thread in the TASK_WAKEKILL state could not be detected as a hung task by the hung_task detector. The details are as follows: Using the following command, I identified nine tasks stuck in the D state: $ ps -eLo state,comm,tid,wchan | grep ^D D java 4177339 xfs_buf_lock D kworker/93:3+xf 3025535 xfs_buf_lock D kworker/87:0+xf 3426612 xfs_extent_busy_flush D kworker/85:0+xf 3479378 xfs_buf_lock D kworker/91:1+xf 3584478 xfs_buf_lock D kworker/80:3+xf 3655680 xfs_buf_lock D kworker/89:0+xf 3671691 xfs_buf_lock D kworker/84:1+xf 3708397 xfs_buf_lock D kworker/81:1+xf 4005763 xfs_buf_lock However, the hung_task detector only reported eight of these tasks: [3108840.650652] INFO: task java:4177339 blocked for more than 247779 seconds. [3108840.654197] INFO: task kworker/93:3:3025535 blocked for more than 248427 seconds. [3108840.657711] INFO: task kworker/85:0:3479378 blocked for more than 247836 seconds. [3108840.661483] INFO: task kworker/91:1:3584478 blocked for more than 249638 seconds. [3108840.664871] INFO: task kworker/80:3:3655680 blocked for more than 249638 seconds. [3108840.668495] INFO: task kworker/89:0:3671691 blocked for more than 249047 seconds. [3108840.672418] INFO: task kworker/84:1:3708397 blocked for more than 247836 seconds. [3108840.676175] INFO: task kworker/81:1:4005763 blocked for more than 247836 seconds. Task 3426612, although in the D state, was not reported as a hung task. I confirmed that task 3426612 remained in the D (disk sleep) state and experienced no context switches over a long period: $ cat /proc/3426612/status | grep -E "State:|ctxt_switches:"; \ sleep 60; echo "----"; \ cat /proc/3426612/status | grep -E "State:|ctxt_switches:" State: D (disk sleep) voluntary_ctxt_switches: 7516 nonvoluntary_ctxt_switches: 0 ---- State: D (disk sleep) voluntary_ctxt_switches: 7516 nonvoluntary_ctxt_switches: 0 The system's hung_task detector settings were configured as follows: kernel.hung_task_timeout_secs = 28 kernel.hung_task_warnings = -1 The issue lies in the handling of task state in the XFS code. Specifically, the thread in question (3426612) was set to the TASK_KILLABLE state in xfs_extent_busy_flush(): xfs_extent_busy_flush prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); When a task is in the TASK_WAKEKILL state (a subset of TASK_KILLABLE), the hung_task detector ignores it, as it assumes such tasks can be terminated. However, in this case, the kernel thread cannot be killed, meaning it effectively becomes a hung task. To address this issue, the hung_task detector should report the kthreads in the TASK_WAKEKILL state. Link: https://lore.kernel.org/linux-xfs/20230620002021.1038067-5-david@fromorbit.com/ [0] Signed-off-by: Yafang Shao Cc: Dave Chinner --- kernel/hung_task.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index c18717189f32..ed63fd84ce2e 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -220,8 +220,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) */ state = READ_ONCE(t->__state); if ((state & TASK_UNINTERRUPTIBLE) && + (t->flags & PF_KTHREAD || !(state & TASK_WAKEKILL) && - !(state & TASK_NOLOAD)) + !(state & TASK_NOLOAD))) check_hung_task(t, timeout); } unlock: